第 02 期 | 核心三剑客:Models, Prompts 与 Parsers (EN)
🎯 Learning Objectives
- Deep dive into the core mechanics of Agents: Understand how LLMs evolve from simple "Q&A machines" to "intelligent decision-makers."
- Master Agent construction in LangChain: Learn how to define Tools, select the appropriate Agent type, and assemble an intelligent entity capable of autonomous thought and action.
- Practical application to enhance the Support Copilot: Give our intelligent customer support project a "brain" capable of calling external APIs and executing complex logic.
- Identify common pitfalls in Agent development: Understand potential issues Agents might face and learn how to effectively avoid them to ensure stable and reliable operation.
📖 Principle Analysis
Welcome back, future AI architects, to Part 3 of the LangChain Masterclass! In the previous two sessions, we laid the groundwork, learning how to make LLMs understand and generate text, and how to give them "memory" and "knowledge" through RAG. But doesn't it still feel a bit like a basic "Q&A machine"? You ask a question, it gives an answer—it lacks proactivity and decision-making capabilities.
Imagine your intelligent support copilot. If a user asks, "What is my order number?", it might find the order inquiry link from the knowledge base. But what if the user asks, "My order number is XYZ, please check the shipping status, and if it has shipped, when will it arrive?" In this case, simple knowledge retrieval isn't enough! It needs to:
- Identify that this is a "shipping status inquiry."
- Know that it needs to call a "shipping tracking API."
- Extract the "order number" from the user's input.
- Call the API and retrieve the results.
- Format the results into a user-friendly response.
This is exactly where Agents shine! The core philosophy of an Agent is to use the LLM not just to answer questions, but as a Reasoning Engine to think about "what needs to be done," take "Action," and continue to "think" and "act" based on the results until the goal is met. Simply put, Agents give LLMs "hands" and "feet," making them more than just smooth talkers.
Core Components of an Agent
A LangChain Agent primarily consists of the following components:
- LLM (Large Language Model): The "brain" of the Agent. It is responsible for understanding user intent, planning action steps, selecting the right tools, and reasoning based on the tool's output. Its level of intelligence directly dictates the Agent's ceiling.
- Tools: The "hands and feet" of the Agent. These are external functions or capabilities the Agent can invoke. They can be functions to query a knowledge base, external API endpoints, math calculators, or even tools to send emails or schedule meetings. Each tool has a clear name and description to help the LLM understand its purpose.
- Agent Executor: The "central nervous system" of the Agent. It receives the LLM's decisions (i.e., "Which tool I plan to use and with what parameters"), actually executes the tool, and then feeds the tool's output (Observation) back to the LLM. This forms a Think-Act-Observe loop until the LLM decides to provide a final answer.
- Prompt: The Agent's "directive." It tells the LLM that it is an Agent, what capabilities (tools) it has, what its goal is, and how it should think and format its output. A well-crafted Prompt is crucial to an Agent's success.
Agent Workflow (Using the ReAct Pattern)
One of the most classic and powerful Agent patterns in LangChain is ReAct (Reasoning and Acting). It mimics the human problem-solving process: Thought -> Action -> Observation -> Thought again...
Here is a typical workflow of an Agent:
graph TD
A[User Query] --> B{Agent Executor};
B -- Passes user query and history --> C(LLM);
C -- Reasons based on Prompt and Tool descriptions --> D{LLM Output: Thought};
D -- LLM Output: Action (Tool name, parameters) --> E{Agent Executor};
E -- Calls specified Tool --> F(Tool Execution);
F -- Tool execution result (Observation) --> G{Agent Executor};
G -- Feeds Observation back to LLM --> C;
C -- Continues reasoning until final answer is reached --> H{LLM Output: Final Answer};
H -- Returns to user --> A;Workflow Breakdown:
- User Query: The user asks the support copilot a question.
- Agent Executor Receives: The executor receives the query and sends it to the LLM, along with the Agent's "directive" (Prompt) and a list/description of available tools.
- LLM Reasoning (Thought): Based on the Prompt and tool descriptions, the LLM analyzes the user's intent and thinks about the steps needed to solve the problem. For example: "The user wants to check an order, I need to use the order tracking tool."
- LLM Decision (Action): The LLM decides which action to take and outputs it in a specific format:
Action: tool_nameandAction Input: parameters. - Agent Executor Runs Tool: The executor parses the LLM's output, calls the corresponding tool, and passes in the parameters.
- Tool Returns Observation: The tool finishes executing and returns the result. For example, the order tracking API returns the order status and shipping details.
- Agent Executor Feeds Back Observation: The executor sends the tool's execution result (Observation) back to the LLM.
- LLM Reasons Again: Upon receiving the Observation, the LLM continues to think based on the new information: Is the problem solved? Are other tools needed? For example, if it found the shipping info, it might directly format the answer; if the order doesn't exist, it might prompt the user to double-check the order number.
- Loop: This "Think-Act-Observe" loop continues until the LLM determines the problem is solved and outputs a
Final Answer. - Return Final Answer: The executor returns the
Final Answerto the user.
This cyclical reasoning and acting process gives Agents the ability to handle complex, multi-step tasks. This is the key to evolving our support copilot from a "dumb Q&A bot" into an "intelligent decision engine"!
Choosing the Right Agent Type
LangChain offers several Agent types, each suited for different scenarios:
zero-shot-react-description: The most general and flexible Agent, based on the ReAct pattern. Suitable for most scenarios requiring general reasoning and tool usage. It relies entirely on the LLM's zero-shot reasoning capabilities.openai-functions: Designed specifically for OpenAI models (like GPT-3.5-turbo, GPT-4) to leverage their Function Calling capabilities. In this mode, the LLM directly "suggests" which function to call and its parameters in a structured JSON format, which is usually more efficient and accurate. Highly recommended when using OpenAI models.conversational-react-description: Also based on the ReAct pattern, but with built-in memory, making it ideal for multi-turn conversational scenarios where it needs to remember previous context.structured-chat-zero-shot-react-description: Best for scenarios that require stricter, structured inputs and outputs.
For our support copilot project, if we are using OpenAI models, openai-functions is the top choice. If using other models, zero-shot-react-description or conversational-react-description are excellent alternatives.
💻 Practical Code Drill (Application in the Support Project)
Alright, the theory is clear. Let's get our hands dirty. How do we integrate an Agent into our intelligent support copilot so it's no longer just a "bookworm" retrieving knowledge, but a "versatile problem solver"?
New Capability Requirements for the Support Copilot:
- Check Order Status: The user provides an order number, and the copilot calls an external API to return shipping information.
- Calculate Refund Amounts: The user provides item prices and discount info, and the copilot performs simple math calculations.
- General Knowledge Q&A: If the above tools can't solve the issue, it still retrieves information from the knowledge base.
We will simulate an Agent equipped with these three capabilities.
1. Define Tools
First, we need to equip our Agent with "hands and feet"—Tools. Here, we simulate three tools: order_status_checker, calculator, and knowledge_base_search.
import os
from typing import List, Union
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
# Set your OpenAI API Key
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
# --- 1. Define Tools ---
@tool
def order_status_checker(order_id: str) -> str:
"""
Query the current status and shipping information of an order based on the order ID.
Input parameter: order_id (string) - The user's order ID.
"""
print(f"\n--- Calling order_status_checker tool for order ID: {order_id} ---")
# Simulate external API call
if order_id == "2023081512345":
return "Order 2023081512345 has been shipped. Tracking number: SF123456789. Estimated delivery: within 3 days."
elif order_id == "2023081567890":
return "Order 2023081567890 is being prepared and is expected to ship tomorrow."
else:
return f"No information found for order ID {order_id}. Please check if the order number is correct."
@tool
def calculator(expression: str) -> str:
"""
A simple calculator that can execute mathematical expressions.
Input parameter: expression (string) - The mathematical expression to calculate, e.g., "100 * 0.8".
"""
print(f"\n--- Calling calculator tool for expression: {expression} ---")
try:
# Using eval() for calculation. Note: eval() poses security risks in production.
# In a real production environment, use a safer math expression parsing library.
result = eval(expression)
return f"The calculation result is: {result}"
except Exception as e:
return f"Calculation failed: {e}"
@tool
def knowledge_base_search(query: str) -> str:
"""
Search the customer support knowledge base for relevant information to answer user questions.
Input parameter: query (string) - The user's question or keywords.
"""
print(f"\n--- Calling knowledge_base_search tool for query: {query} ---")
# Simulate knowledge base retrieval. This is where you would integrate your RAG module.
if "return policy" in query.lower():
return "Our return policy: We support no-questions-asked returns within 7 days of receipt, provided the item is in good condition and does not affect secondary sales. Please contact support for the exact process."
elif "contact support" in query.lower():
return "You can reach our support team by calling the hotline at 400-123-4567, or via the live chat window on our official website."
elif "working hours" in query.lower():
return "Customer support working hours are Monday to Friday, 9:00 AM to 6:00 PM."
else:
return f"Sorry, no direct answer was found in the knowledge base for '{query}'. You can try different keywords or contact human support."
# Combine all tools into a list
tools = [order_status_checker, calculator, knowledge_base_search]
2. Initialize the LLM
We use ChatOpenAI as the "brain" of the Agent.
# --- 2. Initialize the LLM ---
llm = ChatOpenAI(model="gpt-4o", temperature=0) # or gpt-3.5-turbo
3. Build the Agent Prompt
The Prompt is the Agent's directive. For the openai-functions Agent type, LangChain has already encapsulated most of the logic. We just need to provide a system message and placeholders for the message history.
# --- 3. Build the Agent Prompt ---
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are an intelligent customer support copilot with the ability to check orders, perform calculations, and search the knowledge base. Please use your tools to provide accurate help based on the user's question. If the tools cannot solve the issue, guide the user to contact human support."),
MessagesPlaceholder("chat_history"), # Used to store conversation history for multi-turn chats
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"), # Used for the Agent's internal thought and action records
]
)
4. Create the Agent
Use the create_openai_functions_agent function to create the Agent.
# --- 4. Create the Agent ---
agent = create_openai_functions_agent(llm, tools, prompt)
5. Create the Agent Executor
Finally, combine the Agent and Tools into an AgentExecutor, which will drive the entire "Think-Act-Observe" loop.
# --- 5. Create the Agent Executor ---
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # Set to True to see the Agent's thought process
handle_parsing_errors=True, # Automatically handle Agent output parsing errors
max_iterations=10, # Maximum number of iterations to prevent infinite loops
early_stopping_method="generate", # Generate an answer after reaching max_iterations
)
6. Run the Agent Interaction
Now, our intelligent support copilot has autonomous decision-making capabilities! Let's chat with it.
# --- 6. Run the Agent Interaction ---
async def run_agent_interaction(user_input: str, chat_history: List[BaseMessage]) -> List[BaseMessage]:
print(f"\n--- User Question: {user_input} ---")
response = await agent_executor.ainvoke({"input": user_input, "chat_history": chat_history})
ai_response = response["output"]
print(f"\n--- Copilot Response: {ai_response} ---")
# Update conversation history
chat_history.append(HumanMessage(content=user_input))
chat_history.append(AIMessage(content=ai_response))
return chat_history
# Simulate conversation history
chat_history: List[BaseMessage] = []
# Scenario 1: Check order status
chat_history = await run_agent_interaction("When will my order 2023081512345 be delivered?", chat_history)
# Scenario 2: Perform math calculation
chat_history = await run_agent_interaction("I bought items worth 120 dollars, got a 20% discount, and used a 10 dollar coupon. How much did I actually pay?", chat_history)
# Scenario 3: Knowledge base query
chat_history = await run_agent_interaction("What is your return policy?", chat_history)
# Scenario 4: Unsolvable problem (Agent will try the knowledge base, then guide the user)
chat_history = await run_agent_interaction("How many employees does your company have?", chat_history)
# Scenario 5: Invalid order number
chat_history = await run_agent_interaction("What is the status of my order 99999999999?", chat_history)
# Scenario 6: Multi-turn conversation (Agent remembers context) - Note how openai-functions agent handles chat_history
# In this example, we pass the full chat_history every time, and the Agent can utilize it.
chat_history = await run_agent_interaction("What if I want to contact support then?", chat_history)
When you run the code above, you will see the output generated by verbose=True, clearly displaying the Agent's thought process:
- It will first show
Thinking.... - Then it decides
Calling Tool: order_status_checker. - It receives an
Observation. - Finally, it provides the
Final Answer.
It's as if your support copilot is quietly performing complex reasoning and operations in the background, ultimately presenting you with an accurate answer! Doesn't it feel like it just came "alive"?
TypeScript Tip
The approach to implementing Agents in TypeScript is identical. You need to:
- Define Tools: Use
StructuredToolorDynamicToolfrom@langchain/core/tools. - Initialize LLM: Use
ChatOpenAIfrom@langchain/openai. - Build Prompt: Use
ChatPromptTemplateandMessagesPlaceholderfrom@langchain/core/prompts. - Create Agent: Use
createOpenAIFunctionsAgentfrom@langchain/openai. - Create Agent Executor: Use
AgentExecutorfrom@langchain/agents.
The core APIs and parameter naming conventions are very similar, allowing for seamless migration.
Pitfalls and Best Practices
While Agents are powerful, they are not a silver bullet, and you will encounter several "pitfalls" during development. As an experienced developer, let me help you navigate these minefields.
1. Hallucinations & Reasoning Errors
- The Pitfall: The Agent's decision-making relies entirely on the LLM's reasoning capabilities. If the LLM misunderstands a tool or lacks reasoning power, it might choose the wrong tool, provide incorrect parameters, or even fabricate outputs for non-existent tools. This leads to erratic behavior or "nonsense" answers.
- Best Practices:
- Clear, precise tool descriptions: This is paramount! The tool's
namemust be intuitive, and thedescriptionmust detail its function, purpose, input parameters (and their types), and expected output. Imagine explaining the tool to a smart person who has never seen it before. - High-quality LLMs: Using more capable LLMs (like GPT-4o/GPT-4) significantly reduces reasoning errors.
- Prompt Engineering: Clearly define the Agent's responsibilities, behavioral guidelines, and error-handling principles in the system Prompt. For example, instruct it to guide the user to human support if the tools cannot solve the issue.
- Clear, precise tool descriptions: This is paramount! The tool's
2. The Art of Tool Selection and Description
- The Pitfall: Vague tool descriptions or poor naming can make it difficult for the LLM to choose accurately. For instance, if you have two tools named
search_dbandquery_data, the LLM might get confused about which one is for the knowledge base. - Best Practices:
- Unique and concrete naming: Tool names should be concise, meaningful, and unique (e.g.,
order_status_checkerinstead of justcheck). - Detailed and unambiguous descriptions: Include keywords in the description explaining what problem the tool solves, what information it needs, and what it returns. Think about how users will ask questions and incorporate those keywords into the description.
- Avoid overlapping tool functions: If two tools are too similar, the LLM is likely to mix them up.
- Unique and concrete naming: Tool names should be concise, meaningful, and unique (e.g.,
3. Infinite Loops & Deadlocks
- The Pitfall: The Agent might get stuck in an endless "Think-Act-Observe" loop. For example, the LLM repeatedly chooses a tool that can't solve the problem, receives an error observation, and then tries the same (or another ineffective) tool again.
- Best Practices:
max_iterations: Set a maximum number of iterations in theAgentExecutor. This is the most direct safeguard.early_stopping_method: What happens when the max iterations are reached?"generate"will attempt to produce a final answer, while"force"will throw an error."generate"is usually more user-friendly.- Tool robustness: Ensure your tools can handle various inputs and return meaningful error messages instead of just crashing. If the LLM receives a clear error message, it's more likely to adjust its strategy.
- Prompt guidance: Explicitly state in the Prompt that if multiple attempts fail, the Agent should proactively stop and inform the user.
4. Security & Permissions
- The Pitfall: Agents can call arbitrary tools. If these tools involve sensitive operations (like deleting data or transferring funds), malicious users or poor LLM decisions could lead to severe consequences.
- Best Practices:
- Principle of least privilege: The APIs behind the tools called by the Agent should have strictly controlled permissions, granting only the minimum access required to complete the task.
- Sandboxing: For high-risk operations, consider executing them in a sandbox environment or requiring human confirmation (Human-in-the-loop).
- Input validation: Before a tool executes, strictly validate the parameters provided by the LLM to prevent injection attacks or invalid data.
- Sensitive data handling: Prevent the Agent from processing or exposing sensitive user data.
5. Performance & Cost
- The Pitfall: Every "Think-Act" cycle of an Agent usually involves an LLM API call. Complex tasks lead to multiple calls, increasing latency and costs.
- Best Practices:
- Caching: Cache repetitive LLM calls or tool results.
- Asynchronous Execution: For time-consuming tools, consider asynchronous calls to improve concurrency.
- LLM Model Selection: Choose a more cost-effective model (like GPT-3.5-turbo, which is often cheaper than GPT-4) as long as it meets performance requirements.
- Optimize Prompts: Streamline Prompts to reduce unnecessary context and lower token consumption.
- Observability tools like LangSmith: Use LangSmith to clearly see every step the Agent takes, helping you optimize the workflow and identify redundant calls.
6. Observability & Debugging
- The Pitfall: The Agent's internal decision-making process is a black box. When it behaves abnormally, it's hard to know "why" it did what it did.
- Best Practices:
verbose=True: During development, always set theAgentExecutor'sverboseflag toTrue. This prints out the Agent's detailed thought process and is an invaluable debugging tool.- LangSmith: Officially recommended by LangChain, LangSmith is a powerful observability platform. It records and visualizes every Agent run, including LLM inputs/outputs, tool calls, and chain execution traces. It is essential for monitoring and debugging in production.
- Logging: Add detailed logging inside your tools to record the parameters they were called with and the results they returned.
Developing an Agent is like training an apprentice. You have to give it clear instructions (Prompt), provide it with good equipment (Tools), and promptly correct and guide it when it makes mistakes. With practice and observation, you'll be able to train a truly powerful intelligent entity!
📝 Summary
Congratulations! In this session, we unveiled the mysteries of LangChain Agents together, transforming your intelligent support copilot from a simple "Q&A bot" into a "versatile problem solver" with autonomous decision-making capabilities!
We took a deep dive into the core principles of Agents—how the LLM acts as a reasoning engine, utilizing external tools to solve complex problems through a "Think-Act-Observe" loop. We also built tools for checking orders, performing calculations, and searching the knowledge base, and demonstrated how to integrate these capabilities into an Agent through practical code.
More importantly, we anticipated and learned about various "pitfalls" in Agent development and the "best practices" to avoid them, including optimizing tool descriptions, preventing loops, ensuring security, and improving debugging efficiency. This advanced knowledge will be an invaluable asset as you build production-grade AI applications in the future.
Agents are one of the most imaginative and complex modules in LangChain. By mastering them, you hold the key to building truly intelligent and autonomous AI applications.
In the next session, we will explore another powerful feature of LangChain: Memory. You will learn how to make your support copilot remember user preferences and conversation history, thereby providing a more personalized and coherent service. Stay tuned!