Running Prompts with LangChain

In this blog post Running Prompts with LangChain A Practical Guide for Teams and Leaders we will walk through how to design, run, and ship reliable prompts using LangChain’s modern building blocks.

Why prompts and why LangChain

Large language models respond to instructions written as prompts. The challenge is making those prompts consistent, testable, and composable with other parts of your app—like retrieval, tools, and memory. That’s where LangChain shines: it turns prompting into reusable, typed, and observable components you can run locally or in production.

At a high level, LangChain provides a clean way to express “LLM programs.” You define small pieces—prompts, models, output parsers—and wire them together into chains. This lets teams version prompts, add context from your data, enforce structured outputs, and monitor behavior without rewriting glue code.

The technology behind LangChain prompting

Under the hood, LLMs work by predicting the next token (a chunk of text) given an input prompt. Parameters like temperature control randomness. Context windows cap how much text you can send, so you must be deliberate about instructions and which data you include.

LangChain’s core abstractions map neatly onto this:

Prompt templates define reusable instruction text with variables.
Models are providers such as OpenAI or local models; you can swap them with minimal code change.
Output parsers enforce structure (e.g., JSON) so downstream code is reliable.
Runnables/Chains compose prompts, models, and utilities into a single callable unit.
Retrievers and vector stores add your private knowledge via embeddings and similarity search.
Memory lets you keep conversation state across turns securely.

Getting started

Install the essentials and set your API key (OpenAI shown, but LangChain supports many providers):

pip install -U langchain langchain-openai langchain-community faiss-cpu tiktoken
# export OPENAI_API_KEY=...  # or set in your environment

Your first prompt chain

Use the modern LangChain Expression Language (LCEL) to compose prompt → model → parser.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise technical assistant."),
    ("human", "{question}")
])

model = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
chain = prompt | model | StrOutputParser()

print(chain.invoke({"question": "Explain LangChain in one paragraph."}))

This creates a reusable chain. You can call invoke for single runs or batch for parallel inputs.

Prompt templates that scale

Good prompts are explicit, modular, and parameterized. Here’s a template with guardrails and examples:

examples = [
    {"q": "What is RAG?", "a": "Retrieval augmented generation..."},
    {"q": "Define vector embeddings", "a": "Numerical representations..."},
]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert who answers clearly and briefly."),
    ("human", "Task: {task}\nConstraints: {constraints}"),
    ("human", "Examples:\n{examples}"),
    ("human", "Question: {question}")
])

formatted_examples = "\n".join([f"Q: {e['q']}\nA: {e['a']}" for e in examples])

print(prompt.format(
    task="Answer technical questions",
    constraints="No source code unless asked; keep to 3 sentences",
    examples=formatted_examples,
    question="How does temperature affect LLM outputs?"
))

Structured outputs you can trust

Parsing plain text is brittle. Use a JSON parser to enforce a schema.

from typing import Optional
from pydantic import BaseModel, Field
from langchain_core.output_parsers import JsonOutputParser

class Answer(BaseModel):
    answer: str = Field(description="Concise answer")
    confidence: float = Field(description="0.0 to 1.0")
    citations: Optional[list[str]] = None

parser = JsonOutputParser(pydantic_object=Answer)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Return only valid JSON matching this schema:\n{format_instructions}"),
    ("human", "{question}")
]).partial(format_instructions=parser.get_format_instructions())

chain = prompt | model | parser
result = chain.invoke({"question": "What is LangChain?"})
print(result)  # dict with keys: answer, confidence, citations

Adding your data with retrieval

Retrieval-augmented generation (RAG) brings your documents into the prompt at query time. You embed documents into vectors, store them, then fetch the most relevant chunks.

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.runnables import RunnablePassthrough

# Prepare documents
raw_docs = [
    ("doc1", "LangChain provides Runnables, PromptTemplates, and parsers."),
    ("doc2", "Use retrieval to ground answers in your data."),
]
texts = [d[1] for d in raw_docs]

splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = [c for t in texts for c in splitter.split_text(t)]

# Build vector store
emb = OpenAIEmbeddings()
vs = FAISS.from_texts(chunks, embedding=emb)
retriever = vs.as_retriever(k=3)

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

rag_prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using only the provided context. If unsure, say 'I don't know'."),
    ("human", "Context:\n{context}\n\nQuestion: {question}")
])

rag_chain = {
    "context": retriever | format_docs,
    "question": RunnablePassthrough()
} | rag_prompt | model | StrOutputParser()

print(rag_chain.invoke("How does LangChain help with prompting?"))

This pattern keeps prompts lean and focused, and it limits hallucinations by grounding answers in retrieved context.

Keeping conversation state with memory

For chat apps, you often need previous turns. Use message history with a session ID so each user’s context is isolated.

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

base_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("placeholder", "{history}"),
    ("human", "{input}")
])
base_chain = base_prompt | model | StrOutputParser()

store = {}

def get_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chat_chain = RunnableWithMessageHistory(
    base_chain,
    get_history,
    input_messages_key="input",
    history_messages_key="history"
)

session_id = "user-123"
print(chat_chain.invoke({"input": "Remember my project is Apollo"}, config={"configurable": {"session_id": session_id}}))
print(chat_chain.invoke({"input": "What did I say my project was?"}, config={"configurable": {"session_id": session_id}}))

Evaluation and reliability

Prompts need tests. A lightweight approach:

Create a set of input-output examples (golden tests).
Run your chain with temperature=0 and compare outputs.
Track changes whenever you edit prompts or model versions.

tests = [
  {"in": {"question": "Define embeddings"}, "contains": "numerical"},
  {"in": {"question": "What is RAG?"}, "contains": "retrieval"}
]

ok = 0
for t in tests:
    out = chain.invoke(t["in"]).lower()
    if t["contains"] in out:
        ok += 1
print(f"Passed {ok}/{len(tests)} tests")

For more rigorous checks, score responses with another model against criteria (helpfulness, correctness) and keep a changelog of prompt versions.

Best practices for production

Be explicit: system messages set role and constraints; avoid ambiguity.
Use structured outputs for anything machine-read; validate JSON.
Control variability: set temperature low for deterministic tasks.
Stay within token budgets: summarize or shorten context when needed.
Guard against prompt injection in RAG: filter sources, prefix with clear rules, and quote context distinctly.
Separate content from logic: store prompts as templates, version them, and document changes.
Observe and log: capture inputs, outputs, and latencies for iterative tuning.
Fail gracefully: set timeouts, retries, and fallbacks to smaller/cheaper models.

Troubleshooting checklist

Hallucinations: add retrieval context; ask the model to say “I don’t know.”
Inconsistent JSON: use a JSON parser and provide the schema in the prompt.
Too verbose answers: instruct max length and give examples.
Slow responses: reduce context size, switch to faster models, or cache.
Drift after edits: re-run golden tests; version prompts and models.

Putting it all together

With LangChain, you can express prompt logic as small, testable parts that compose cleanly: templates for clarity, models for inference, output parsers for structure, retrieval for grounding, and memory for continuity. Start simple, add structure as requirements grow, and treat prompts like production code—versioned, observed, and tested.

Next steps

Wrap your chains behind an API; keep prompts and config out of code where possible.
Introduce RAG for proprietary data; add filters and source attributions.
Automate evaluations and monitor latency, cost, and quality.

That’s the foundation for running prompts with LangChain in real-world systems—simple to start, powerful as you scale.

Discover more from CPI Consulting

Subscribe to get the latest posts sent to your email.