Issue 02 | The Core Trio: Models, Prompts, and Parsers
🎯 Learning Objectives for This Issue
- Deeply understand the core working mechanism of Agents: Master the principles of how LLMs transform from "Q&A machines" into "intelligent decision-makers".
- Master the construction methods of Agents in LangChain: Learn to define Tools, select the appropriate Agent type, and assemble them into an intelligent entity capable of autonomous thought and action.
- Practically apply Agents to enhance the customer service assistant: Add a "brain" to our intelligent customer service project that can call external APIs and execute complex logic.
- Identify common pitfalls in Agent development: Understand potential issues Agents might encounter and learn how to effectively avoid them to ensure stable and reliable operation.
📖 Principle Analysis
Welcome, future AI architects, to the third issue of the "LangChain Full-Stack Masterclass"! In the previous two issues, we laid the foundation, learning how to make LLMs understand and generate text, and how to endow them with "memory" and "knowledge" through RAG. But don't you feel it still acts like a "Q&A machine"? It answers whatever you ask, lacking proactivity and decision-making power.
Imagine your intelligent customer service assistant. If a user asks, "What is my order number?", it might find the order query portal from the knowledge base. But if the user asks, "My order number is XYZ, please help me check the logistics status. If it has been shipped, how long is it expected to take?", at this point, mere knowledge retrieval is not enough! It needs to:
- Identify that this is a "logistics query" request.
- Know that it needs to call a "logistics query API".
- Extract the "order number" from the user's input.
- Call the API and get the result.
- Organize the result into a user-friendly answer.
This is where Agents come into play! The core idea of an Agent is: to make the LLM not just answer questions, but act as a Reasoning Engine to think about "what do I need to do", then "take action" (Action), and continue to "think" and "act" based on the results of the action until the goal is achieved. Simply put, Agents give LLMs "hands" and "feet", so they are no longer just "all talk".
Core Components of an Agent
A LangChain Agent mainly consists of the following parts:
- LLM (Large Language Model): The "brain" of the Agent. It is responsible for understanding user intent, planning action steps, selecting appropriate tools, and conducting further reasoning based on the tools' outputs. Its level of intelligence directly determines the upper limit of the Agent.
- Tools: The "hands and feet" of the Agent. These are external functions or capabilities that the Agent can call. They can be functions to query a knowledge base, interfaces to call external APIs, tools to perform mathematical calculations, or even send emails, create schedules, etc. Each tool has a clear name and description to help the LLM understand its purpose.
- Agent Executor: The "central nervous system" of the Agent. It is responsible for receiving the LLM's decisions (i.e., "which tool I intend to use and what the parameters are"), actually calling the tool, and then feeding the tool's output (Observation) back to the LLM, forming a thought-action-observation loop until the LLM decides to provide the final answer.
- Prompt: The "action guideline" of the Agent. It tells the LLM that it is an Agent, what capabilities (tools) it has, what its goal is, and how it should think and output. A good Prompt is the key to an Agent's success.
Agent Workflow (Using the ReAct Pattern as an Example)
One of the most classic and powerful Agent patterns in LangChain is ReAct (Reasoning and Acting). It mimics the human problem-solving process: Thought -> Action -> Observation -> Thought again...
Below is a typical workflow of an Agent:
graph TD
A[User Query] --> B{Agent Executor};
B -- Passes user query and history --> C(LLM);
C -- Reasons based on Prompt and tool descriptions --> D{LLM Output: Thought};
D -- LLM Output: Action (Tool Name, Parameters) --> E{Agent Executor};
E -- Calls the specified Tool --> F(Tool Execution);
F -- Tool execution result (Observation) --> G{Agent Executor};
G -- Feeds Observation back to LLM --> C;
C -- Continues reasoning until final answer is reached --> H{LLM Output: Final Answer};
H -- Returns to user --> A;Workflow Analysis:
- User Query: The user asks the customer service assistant a question.
- Agent Executor Receives: The executor receives the query and sends it to the LLM, along with the Agent's "action guideline" (Prompt) and a list and description of available tools.
- LLM Reasoning (Thought): Based on the Prompt and tool descriptions, the LLM analyzes the user's intent and thinks about the steps needed to solve the problem. For example: "The user wants to check an order, I need to call the order query tool."
- LLM Decision (Action): The LLM decides which action to take and outputs it in a specific format:
Action: tool_nameandAction Input: parameters. - Agent Executor Executes Tool: The executor parses the LLM's output, calls the corresponding tool, and passes in the parameters.
- Tool Returns Observation: The tool finishes executing and returns the result. For example, the order query API returns the order status, logistics information, etc.
- Agent Executor Feeds Back Observation: The executor feeds the tool's execution result (Observation) back to the LLM again.
- LLM Reasons Again: After receiving the Observation, the LLM continues to think based on the new information: Is the problem solved? Are other tools needed? For example, if it finds the logistics information, it might directly organize the answer; if it finds the order does not exist, it might guide the user to check the order number.
- Loop: This "thought-action-observation" loop continues until the LLM believes the problem is solved and outputs the
Final Answer. - Return Final Answer: The executor returns the
Final Answerto the user.
This cyclical reasoning and acting give the Agent the ability to handle complex, multi-step tasks, which is exactly the key to our intelligent customer service assistant evolving from a "foolproof Q&A machine" into an "intelligent decision engine"!
Agent Type Selection
LangChain provides various Agent types, each with its applicable scenarios:
zero-shot-react-description: The most general and flexible Agent, based on the ReAct pattern, suitable for most scenarios requiring general reasoning and tool calling. It relies entirely on the LLM's zero-shot reasoning capabilities.openai-functions: Designed specifically for OpenAI models (like GPT-3.5-turbo, GPT-4), utilizing their Function Calling capabilities. In this mode, the LLM will directly "suggest" which function to call and its parameters in a structured JSON format, which is usually more efficient and accurate. Highly recommended when using OpenAI models.conversational-react-description: Also based on the ReAct pattern, but with built-in memory capabilities, making it more suitable for multi-turn dialogue scenarios as it can remember previous conversation content.structured-chat-zero-shot-react-description: Suitable for scenarios requiring stricter, structured inputs/outputs.
In our customer service assistant project, if using an OpenAI model, openai-functions is the top choice; if using other models, zero-shot-react-description or conversational-react-description would be excellent choices.
💻 Practical Code Drill (Specific Application in the Customer Service Project)
Alright, the principles are clear, now let's get down to business. How do we integrate an Agent into the intelligent customer service assistant so that it is no longer just a "bookworm" that only retrieves knowledge, but a "versatile expert" that can truly solve problems?
New Capability Requirements for the Customer Service Assistant:
- Query Order Status: The user provides an order number, and the assistant can call an external API to query and return logistics information.
- Calculate Refund Amount: The user provides the product price and discount information, and the assistant can perform simple mathematical calculations.
- General Knowledge Q&A: If the above tools cannot solve the problem, it can still retrieve information from the knowledge base.
We will simulate an Agent possessing these three capabilities.
1. Define Tools
First, we need to prepare the "hands and feet" for the Agent—Tools. Here we simulate three tools: order_status_checker (query order), calculator (calculator), and knowledge_base_search (knowledge base search).
import os
from typing import List, Union
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
# Set your OpenAI API Key
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
# --- 1. Define Tools ---
@tool
def order_status_checker(order_id: str) -> str:
"""
Query the current status and logistics information of an order based on the order ID.
Input parameter: order_id (string) - The user's order ID.
"""
print(f"\n--- Calling order_status_checker tool to query order ID: {order_id} ---")
# Simulate external API call
if order_id == "2023081512345":
return "Order number 2023081512345 has been shipped, tracking number SF123456789, expected delivery within 3 days."
elif order_id == "2023081567890":
return "Order number 2023081567890 is being prepared, expected to ship tomorrow."
else:
return f"No information found for order number {order_id}, please check if the order number is correct."
@tool
def calculator(expression: str) -> str:
"""
A simple calculator that can execute mathematical expressions.
Input parameter: expression (string) - The mathematical expression to be calculated, e.g., "100 * 0.8".
"""
print(f"\n--- Calling calculator tool to calculate expression: {expression} ---")
try:
# Use the eval() function for calculation. Note the security risks of eval() in a production environment.
# In actual production, a safer mathematical expression parsing library should be used.
result = eval(expression)
return f"The calculation result is: {result}"
except Exception as e:
return f"Calculation failed: {e}"
@tool
def knowledge_base_search(query: str) -> str:
"""
Search for relevant information in the customer service knowledge base to answer user questions.
Input parameter: query (string) - The user's question or keywords.
"""
print(f"\n--- Calling knowledge_base_search tool to search knowledge base: {query} ---")
# Simulate knowledge base retrieval, a RAG module can be integrated here.
if "return policy" in query:
return "Our return policy is: We support no-reason returns within 7 days after receiving the goods. Please ensure the goods are intact and do not affect secondary sales. For specific procedures, please contact customer service."
elif "contact customer service" in query:
return "You can contact our customer service team by calling the hotline 400-123-4567 or via the online chat window on our official website."
elif "working hours" in query:
return "Customer service working hours are Monday to Friday, 9 AM to 6 PM."
else:
return f"Sorry, no direct answer was found in the knowledge base for '{query}'. You can try changing the keywords or contact human customer service."
# Combine all tools into a list
tools = [order_status_checker, calculator, knowledge_base_search]
2. Initialize LLM
We use ChatOpenAI as the "brain" of the Agent.
# --- 2. Initialize LLM ---
llm = ChatOpenAI(model="gpt-4o", temperature=0) # or gpt-3.5-turbo
3. Build Agent Prompt
The Prompt is the action guideline for the Agent. For the openai-functions type Agent, LangChain has already encapsulated most of the logic for us; we only need to provide a system message and a placeholder for the message history.
# --- 3. Build Agent Prompt ---
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are an intelligent customer service assistant with the ability to query orders, perform calculations, and search the knowledge base. Please use your tools to provide accurate help based on the user's questions. If the tools cannot solve the problem, please guide the user to contact human customer service."),
MessagesPlaceholder("chat_history"), # Used to store chat history to enable multi-turn dialogue
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"), # Used for the Agent's internal thought and action records
]
)
4. Create Agent
Use the create_openai_functions_agent function to create the Agent.
# --- 4. Create Agent ---
agent = create_openai_functions_agent(llm, tools, prompt)
5. Create Agent Executor
Finally, combine the Agent and Tools into an AgentExecutor, which will be responsible for driving the entire "thought-action-observation" loop.
# --- 5. Create Agent Executor ---
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # Set to True to see the Agent's thought process
handle_parsing_errors=True, # Automatically handle Agent output parsing errors
max_iterations=10, # Maximum number of iterations to prevent infinite loops
early_stopping_method="generate", # Generate an answer after reaching max_iterations
)
6. Run Agent Interaction
Now, our intelligent customer service assistant has autonomous decision-making capabilities! Let's chat with it.
# --- 6. Run Agent Interaction ---
async def run_agent_interaction(user_input: str, chat_history: List[BaseMessage]) -> List[BaseMessage]:
print(f"\n--- User Question: {user_input} ---")
response = await agent_executor.ainvoke({"input": user_input, "chat_history": chat_history})
ai_response = response["output"]
print(f"\n--- Assistant Reply: {ai_response} ---")
# Update chat history
chat_history.append(HumanMessage(content=user_input))
chat_history.append(AIMessage(content=ai_response))
return chat_history
# Simulate chat history
chat_history: List[BaseMessage] = []
# Scenario 1: Query order status
chat_history = await run_agent_interaction("When will my order 2023081512345 be delivered?", chat_history)
# Scenario 2: Perform mathematical calculation
chat_history = await run_agent_interaction("I bought something for 120 yuan, got a 20% discount, and used a 10 yuan coupon. How much did I actually pay?", chat_history)
# Scenario 3: Knowledge base query
chat_history = await run_agent_interaction("What is your return policy?", chat_history)
# Scenario 4: Unsolvable problem (Agent will try the knowledge base, then guide)
chat_history = await run_agent_interaction("How many employees does your company have?", chat_history)
# Scenario 5: Incorrect order number
chat_history = await run_agent_interaction("What is the status of my order 99999999999?", chat_history)
# Scenario 6: Multi-turn dialogue (Agent remembers context) - Note the chat_history handling of openai-functions agent
# In this example, we pass the complete chat_history every time, and the Agent can utilize it.
chat_history = await run_agent_interaction("Then what if I want to contact customer service?", chat_history)
Running the above code, you will see the output of verbose=True, which clearly displays the Agent's thought process:
- It will first
Thinking.... - Then decide
Calling Tool: order_status_checker. - Obtain the
Observation. - Finally provide the
Final Answer.
It's like your customer service assistant is silently performing complex reasoning and operations in the background, ultimately presenting you with an accurate answer! Doesn't it feel like it has suddenly "come alive"?
TypeScript Tips
The approach to implementing an Agent in TypeScript is exactly the same. You need to:
- Define Tools: Use
StructuredToolorDynamicToolfrom@langchain/core/tools. - Initialize LLM: Use
ChatOpenAIfrom@langchain/openai. - Build Prompt: Use
ChatPromptTemplateandMessagesPlaceholderfrom@langchain/core/prompts. - Create Agent: Use
createOpenAIFunctionsAgentfrom@langchain/openai. - Create Agent Executor: Use
AgentExecutorfrom@langchain/agents.
The core APIs and parameter naming are very similar, allowing for seamless migration.
Pitfalls and Troubleshooting Guide
While Agents are powerful, they are not a panacea, and you will encounter many "pitfalls" during development. As an experienced veteran, let me help you navigate these minefields in advance and point you in the right direction.
1. Hallucinations & Reasoning Errors
- Pitfall: The Agent's decision-making relies entirely on the LLM's reasoning capabilities. If the LLM misunderstands a tool or lacks sufficient reasoning ability, it might select the wrong tool, provide incorrect parameters, or even fabricate outputs for tools that don't exist. This leads to abnormal Agent behavior or even "talking nonsense."
- Troubleshooting Guide:
- Clear and precise tool descriptions: This is of utmost importance! The tool's
nameshould be intuitive, and thedescriptionshould detail its function, purpose, input parameters and their types, and output results. Imagine you are explaining it to someone smart who has never seen this tool before. - High-quality LLMs: Using more capable LLMs (like GPT-4o/GPT-4) can significantly reduce reasoning errors.
- Prompt Engineering: Clearly define its responsibilities, behavioral norms, and error-handling principles in the Agent's system Prompt. For example, if the tools cannot solve the problem, guide the user to contact human customer service.
- Clear and precise tool descriptions: This is of utmost importance! The tool's
2. The Art of Tool Selection and Description
- Pitfall: Vague tool descriptions and improper naming will make it difficult for the LLM to choose accurately. For example, if there are two tools, one named
search_dband the otherquery_data, the LLM might be confused about which one is for knowledge base search. - Troubleshooting Guide:
- Unique and concrete naming: Tool names should be concise, meaningful, and unique, such as
order_status_checkerinstead of justcheck. - Detailed and unambiguous descriptions: Include keywords in the description, explaining what problem the tool solves, what information it needs, and what results it returns. Think about how users would ask questions and integrate those keywords into the description.
- Avoid overlapping tool functions: If two tools have overly similar functions, the LLM might get confused.
- Unique and concrete naming: Tool names should be concise, meaningful, and unique, such as
3. Infinite Loops & Deadlocks
- Pitfall: The Agent might fall into an infinite "thought-action-observation" loop. For example, the LLM repeatedly chooses a tool that cannot solve the problem, receives an error observation, and then tries the same or another invalid tool again.
- Troubleshooting Guide:
max_iterations: Set the maximum number of iterations in theAgentExecutor. This is the most direct protective measure.early_stopping_method: How to handle reaching the maximum iterations?"generate"will attempt to generate a final answer, while"force"will directly throw an error. Usually,"generate"is more user-friendly.- Tool robustness: Ensure your tools can handle various inputs and return meaningful error messages instead of crashing directly. If the LLM receives clear error messages, it is more likely to adjust its strategy.
- Prompt guidance: Explicitly state in the Prompt that if multiple attempts yield no results, it should proactively stop and inform the user.
4. Security & Permissions
- Pitfall: Agents can call any tool. If these tools involve sensitive operations (like deleting data or transferring money), malicious users or improper LLM decisions could lead to severe consequences.
- Troubleshooting Guide:
- Principle of least privilege: The permissions of the APIs behind the tools called by the Agent should be strictly controlled, granting only the minimum permissions necessary to complete the task.
- Sandbox environment: For high-risk operations, consider executing them in a sandbox environment or requiring human confirmation.
- Input validation: Before execution, tools must strictly validate the parameters provided by the LLM to prevent injection attacks or invalid data.
- Sensitive data handling: Avoid having the Agent process or expose sensitive user data.
5. Performance & Cost
- Pitfall: Every "thought-action" loop of the Agent might be an LLM API call. Complex tasks will lead to multiple calls, thereby increasing latency and cost.
- Troubleshooting Guide:
- Caching: Cache repeated LLM calls or tool results.
- Asynchronous Execution: For time-consuming tools, consider asynchronous calls to improve concurrency.
- LLM model selection: While ensuring effectiveness, choose more cost-effective models (e.g., GPT-3.5-turbo is often cheaper than GPT-4).
- Optimize Prompt: Streamline the Prompt, reduce unnecessary context, and lower token consumption.
- Observability tools like LangSmith: Using LangSmith allows you to clearly see every step of the Agent's operation, helping you optimize the workflow and discover redundant calls.
6. Observability & Debugging
- Pitfall: The Agent's internal decision-making process is a black box. When it behaves abnormally, it is hard to know "why" it did so.
- Troubleshooting Guide:
verbose=True: During the development phase, always setverbosetoTruein theAgentExecutor. This will print out the Agent's detailed thought process and is a powerful debugging tool.- LangSmith: LangSmith, officially recommended by LangChain, is a powerful observability platform. It can record and visualize every run of the Agent, including LLM inputs/outputs, tool calls, chain execution traces, etc., which is crucial for monitoring and debugging in production environments.
- Logging: Add detailed logs inside the tools to record the parameters they were called with and the returned results.
Developing an Agent is like training an apprentice. You must give it clear instructions (Prompt), provide it with useful tools (Tools), and promptly correct and guide it when it makes mistakes. Practice more, observe more, and you will be able to train a truly powerful intelligent entity!
📝 Summary of This Issue
Congratulations! In this issue's lesson, we unveiled the mystery of LangChain Agents together, allowing your intelligent customer service assistant to make a gorgeous transformation from a simple "Q&A robot" into an "intelligent versatile expert" with autonomous decision-making capabilities!
We deeply understood the core principles of Agents—how LLMs act as reasoning engines to solve complex problems using external tools through a "thought-action-observation" loop. We also manually built tools for the customer service assistant to query orders, perform calculations, and search the knowledge base, and demonstrated how to integrate these capabilities into the Agent through practical code.
More importantly, we anticipated and learned about various "pitfalls" and "troubleshooting guides" in Agent development together, including how to optimize tool descriptions, prevent loops, ensure security, and improve debugging efficiency. These advanced experiences will be a valuable asset for you when building production-grade AI applications in the future.
Agents are one of the most imaginative and complex modules in LangChain. By mastering them, you hold the core key to building truly intelligent and autonomous AI applications.
In the next issue, we will continue to explore another powerful feature of LangChain: Memory. You will learn how to make your customer service assistant remember user preferences and historical dialogues, thereby providing more personalized and coherent services. Stay tuned!