第 03 期 | LCEL 语法精讲:现代化链式编排 (EN)
🎯 Learning Objectives for This Session
Hey there, future AI architects! Welcome back to Part 5 of the LangChain Full-Stack Masterclass. Over the past few sessions, we've built a solid foundation, and we've gotten pretty good at stringing together Chains. But have you noticed that while our intelligent support copilot can answer questions, it feels a bit... "rigid"? It can only follow the exact flows we've predefined. The moment it encounters something unexpected, it freezes up.
It's like handing an intern a highly detailed operations manual. They can follow it to the letter, but if something happens that isn't in the manual, they have to come running to you for help. What we actually want is a "senior intern"—someone who can think for themselves, find the right tools, and solve problems autonomously!
In this session, we are going to upgrade our support copilot's brain, granting it the power of autonomous decision-making. By the end of this session, you will:
- Master the core components of LangChain Agents: Completely understand how the trio of
Tools,LLM, andAgentExecutorwork together. - Understand how Agents work under the hood: Gain deep insights into the Agent's "Observe-Think-Act-Reflect" loop, which gives the LLM its intelligent reasoning capabilities.
- Learn to design and integrate custom tools for your support copilot: Enable your copilot to do more than just answer questions—it will be able to query databases, call APIs, and solve complex problems.
- Build a support Agent capable of autonomous decision-making and problem-solving: Say goodbye to rigid, fixed workflows and step into the world of truly intelligent applications.
Ready? Fasten your seatbelts, we're ready for takeoff!
📖 Concept Breakdown
Agents: Giving the LLM "Hands" and "Feet"
Remember the Chains we discussed earlier? They are powerful and can link different LLM calls and processing steps together. For example, we can create a chain that first summarizes, then translates, and finally generates a response. But the core issue with Chains is: the workflow is fixed. It goes exactly where you tell it to go.
Imagine your support copilot receives this question: "What is the status of my order XYZ123?"
If you use Chains, you might need to:
- Determine that this is an "order inquiry" intent.
- Extract the order number.
- Call a fixed "order inquiry" API.
- Return the result.
But what if the user asks: "What are your latest smartphone models? What are their features? Which one should I choose?" This might require:
- Determining that this is a "product inquiry" and "recommendation" intent.
- Querying the product database.
- Retrieving product features.
- Making a recommendation based on user preferences (if available).
See the problem? Different questions require different tools and different resolution paths. If you use Chains, you'd have to pre-build a specific chain for every possible scenario. Not only is this overly complex, but the moment the Chain encounters an unforeseen question, it completely breaks down.
Agents were created specifically to solve this limitation of Chains. The core philosophy of an Agent is: Empower the LLM with autonomous decision-making. Instead of telling the LLM exactly what to do, you provide the LLM with a set of tools and let it decide when to use them, which ones to use, and how to use them to solve the problem.
Think of it like giving a task to a senior engineer. They will analyze the task and, based on the situation, choose to use an IDE, a debugger, documentation, or a search engine to get the job done. An Agent is essentially the LLM operating in this "senior engineer" mode.
The Three Core Elements of an Agent: Tools, Brain, and Executor
The true power of a LangChain Agent lies in how elegantly it combines three core components:
Tools:
- What are they? These are external functions or data sources that the LLM can call. Imagine these as the LLM's "hands" and "feet," allowing it to interact with the real world.
- Examples: Search engines, calculators, database query interfaces, API calls (like weather APIs or order tracking APIs), code interpreters, or even custom functions we write to query our internal knowledge base.
- How are they defined? In LangChain, a
Tooltypically consists of aname, adescription(detailing what the tool is used for), and afunc(the actual Python function that executes the logic). Thedescriptionis absolutely critical because the LLM relies entirely on it to decide whether and how to use the tool.
LLM (Large Language Model):
- What is it? This is the "brain" of the Agent. It handles all the reasoning, decision-making, and planning. It takes the user's request, the descriptions of available tools, and the results of previous steps (Observations), and then thinks about what to do next.
- Functions:
- Understand the question: Parse the user's intent.
- Select tools: Decide which tool to use based on the question and tool descriptions.
- Generate parameters: Create the correct input parameters for the selected tool.
- Plan steps: In multi-step reasoning, decide whether to continue using tools or if the final answer has been reached.
- Generate the final answer: Once the problem is solved, format the result into natural language and return it to the user.
AgentExecutor:
- What is it? This is the "executive officer" of the Agent. It coordinates the interaction between the LLM and the Tools. It runs a loop, continuously passing the LLM's output (Actions) to the corresponding tools for execution, and then feeding the tool's output (Observations) back to the LLM, until the LLM provides a final answer.
- The Loop Mechanism:
- User input -> Received by AgentExecutor.
- AgentExecutor sends the input and the list of tools to the LLM.
- The LLM generates a Thought, then decides on an Action and an Action Input.
- AgentExecutor receives the Action command, finds the corresponding tool, and executes it.
- The tool execution produces an Observation.
- AgentExecutor feeds the Observation back to the LLM.
- The LLM thinks again based on the new Observation, deciding whether to take another action or provide the final answer.
- This loop continues until the LLM outputs a "Final Answer".
The ReAct Framework: The Dance of Reasoning and Acting
To truly understand the AgentExecutor's loop mechanism, we have to talk about the ReAct (Reasoning and Acting) framework. This is currently one of the most popular and intuitive operational models for Agents.
The core idea of ReAct is that the LLM alternates between Reasoning and Acting.
- Reasoning (Thought): The LLM first "thinks" about the situation. It analyzes the current problem, the information it already has, and the available tools, then plans its next move. This "thought" process is usually printed out, allowing us to see the LLM's internal logic.
- Acting (Action): After thinking, the LLM decides on a specific "action". This action includes:
- Action: The name of the tool to call.
- Action Input: The parameters to pass to that tool.
Next, the AgentExecutor executes this Action and gets an Observation. This Observation is then fed back to the LLM as new information, prompting the next round of Reasoning.
This process is a lot like a detective solving a case:
- User Question (Observation): "Who is the killer?"
- Detective's Thought: "Hmm, I need clues. I can check the crime scene photos or question witnesses."
- Detective's Action: "Check crime scene photos." (Calls a tool)
- Crime Scene Photos (Observation): "Found a bloody knife."
- Detective's Thought: "Are there fingerprints on the knife? I can send it to the lab."
- Detective's Action: "Send knife to the lab for fingerprinting." (Calls a tool)
- Lab Report (Observation): "Fingerprints belong to John Doe."
- Detective's Thought: "Does John Doe have an alibi? I need to question him."
- Detective's Action: "Question John Doe." (Calls a tool)
- Interrogation Result (Observation): "John Doe confessed."
- Detective's Thought: "All clues point to John Doe. I have found the killer."
- Detective's Conclusion (Final Answer): "The killer is John Doe."
By alternating between thinking and acting, the ReAct framework enables the LLM to perform multi-step reasoning and dynamically select and use tools, massively enhancing its ability to solve complex problems.
Mermaid Diagram: The Agent Workflow
Let's use a Mermaid diagram to visualize the Agent's workflow:
graph TD
A[User Input: Question] --> B(AgentExecutor)
B --> C{LLM (Agent Brain)}
C -- "Thought: Think about next step" --> D1[Action: Tool Name]
C -- "Action Input: Tool Parameters" --> D2[Action: Tool Parameters]
D1 & D2 --> E{Tools (External Toolset)}
E -- "Execute Tool" --> F[Observation: Tool Execution Result]
F --> C
C -- "If problem is solved" --> G[Final Answer]
G --> B
B --> H[User Output: Answer]
subgraph Internal Agent Loop
C -- "Thought, Action, Action Input" --> E
E -- "Observation" --> C
endDiagram Explanation:
- User Input: The user asks the Agent a question.
- AgentExecutor: Acts as the chief commander, receiving user input and coordinating the LLM and Tools.
- LLM (Agent Brain):
- Receives the current context (user question, available tool descriptions, chat history, previous
Observations). - Generates a
Thought, deciding what to do next. - Outputs an
Action(the name of the tool to call) andAction Input(the tool's parameters).
- Receives the current context (user question, available tool descriptions, chat history, previous
- Tools (External Toolset): Contains all the functions or APIs the Agent is allowed to call.
- Execute Tool: The AgentExecutor takes the LLM's
ActionandAction Input, finds the matching tool in theToolslist, and runs it. - Observation (Tool Execution Result): The result returned after the tool runs, collected by the AgentExecutor.
- Feedback Loop: The
Observationis sent back to theLLMas new context, prompting the LLM to generate its nextThoughtandAction. - Final Answer: When the LLM determines the problem is solved and it can provide a definitive answer, it outputs the
Final Answer. - User Output: The AgentExecutor returns the
Final Answerto the user.
This loop continues indefinitely until the LLM is confident it has found the final answer to the problem.
💻 Practical Code Exercise (Application in the Support Copilot Project)
Alright, we've covered the theory thoroughly. Now it's time to bring our intelligent support copilot to life!
Scenario Setup: Our support copilot needs to handle several types of customer requests:
- Query Product Catalog: The user wants to know what products we have or the features of a specific product.
- Check Order Status: The user provides an order number and wants to know its real-time status.
- Search Knowledge Base: The user asks a general question (like technical troubleshooting) and needs an answer from the internal knowledge base.
We will create tools for each of these three scenarios and let the Agent choose which one to use autonomously.
import os
from dotenv import load_dotenv
from langchain_openai import OpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain.tools import Tool
from langchain import hub # Used to fetch the ReAct prompt template
from langchain_core.prompts import PromptTemplate
# Load environment variables, ensure your OPENAI_API_KEY is set
load_dotenv()
# 1. Define the toolset available to our support copilot
# Simulate a product catalog search tool
def search_product_catalog(query: str) -> str:
"""Query the product catalog based on product name or keywords, and return product information.
Example: 'search phone', 'latest earbuds', 'laptop features'
"""
products_db = {
"智能手机": "最新款智能手机,配备A17芯片,120Hz刷新率屏幕,AI拍照功能。",
"无线耳机": "高品质无线耳机,支持主动降噪,续航24小时,Type-C充电。",
"笔记本电脑": "轻薄高性能笔记本电脑,搭载M3芯片,视网膜显示屏,适合专业人士。",
"智能手表": "健康监测智能手表,支持心率、血氧监测,多种运动模式,NFC支付。",
"平板电脑": "娱乐学习两用平板电脑,大尺寸屏幕,支持手写笔,视听体验极佳。"
}
query = query.lower()
results = [
f"{name}: {desc}"
for name, desc in products_db.items()
if query in name.lower() or query in desc.lower()
]
if results:
return "\n".join(results)
else:
return f"抱歉,未能找到与 '{query}' 相关的产品信息。"
# Simulate an order status checking tool
def check_order_status(order_id: str) -> str:
"""Query the current status of an order based on the order ID.
The order ID must be a number, for example: '1001', '2003'
"""
order_db = {
"1001": "订单1001已发货,预计3天内送达。",
"1002": "订单1002正在处理中,预计今天下午打包。",
"1003": "订单1003已签收,感谢您的购买。",
"2001": "订单2001已取消。",
"2002": "订单2002支付失败,请重新尝试。",
}
if order_id in order_db:
return order_db[order_id]
else:
return f"抱歉,未能找到订单号 '{order_id}' 的信息,请确认订单号是否正确。"
# Simulate a knowledge base search tool
def search_knowledge_base(query: str) -> str:
"""Search the internal knowledge base for common questions and solutions.
Example: 'how to connect bluetooth earbuds', 'Wi-Fi connection issues', 'what to do if phone freezes'
"""
kb_data = {
"如何连接蓝牙耳机": "请打开手机蓝牙,长按耳机电源键进入配对模式,在手机蓝牙设置中选择耳机名称进行连接。",
"Wi-Fi连接问题": "尝试重启路由器和手机,检查Wi-Fi密码是否正确,确保信号良好。如果问题依旧,请联系客服。",
"手机死机怎么办": "长按电源键10秒强制重启手机。如果频繁死机,建议备份数据后恢复出厂设置。",
"退换货政策": "商品在签收后7天内可无理由退货,15天内可换货。请保持商品完好,并联系客服办理。",
"发票申请": "购买成功后,可在订单详情页申请电子发票,通常在1-3个工作日内开具。"
}
query = query.lower()
results = [
f"问题: {q}\n答案: {a}"
for q, a in kb_data.items()
if query in q.lower() or query in a.lower()
]
if results:
return "\n---\n".join(results)
else:
return f"抱歉,知识库中未能找到与 '{query}' 相关的信息。"
# Wrap these functions into LangChain Tool objects
tools = [
Tool(
name="SearchProductCatalog",
func=search_product_catalog,
description="""
Use this when you need to query product information, product features, product recommendations, or anything related to the product catalog.
The input should be the product keyword or product type queried by the user.
Example: 'latest phone', 'earbud features', 'what laptops do you have'
""",
),
Tool(
name="CheckOrderStatus",
func=check_order_status,
description="""
Use this when you need to check the real-time status of a user's order.
The input MUST be the specific order number provided by the user (numbers only).
Example: '1001', '2003'
""",
),
Tool(
name="SearchKnowledgeBase",
func=search_knowledge_base,
description="""
Use this when you need to query internal knowledge base information such as general questions, common troubleshooting, terms of service, or user guides.
The input should be the specific question or keyword queried by the user.
Example: 'how to connect bluetooth', 'Wi-Fi issue', 'return policy'
""",
),
]
# 2. Initialize the LLM (The Agent's Brain)
# It is recommended to use models like gpt-4 or gpt-3.5-turbo, as they perform better at reasoning
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0) # Set temperature to 0 to reduce randomness
# 3. Fetch the ReAct Agent prompt template
# LangChain Hub provides many preset prompt templates; ReAct is one of them
# If you want to customize the prompt, you can also create your own PromptTemplate
prompt = hub.pull("hwchase17/react")
# 4. Create the Agent
# create_react_agent is a convenience function used to create an Agent based on the ReAct framework
# It requires an LLM, Tools, and a PromptTemplate
agent = create_react_agent(llm, tools, prompt)
# 5. Create the AgentExecutor (The Agent's Executive Officer)
# verbose=True will print out the Agent's thought process (Thought, Action, Observation), which is crucial for debugging!
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)
# 6. Run the Agent and watch it make autonomous decisions!
print("--- 场景一:查询产品信息 ---")
# User asks for product info, Agent should choose the SearchProductCatalog tool
response = agent_executor.invoke({"input": "你们最新的智能手机型号有哪些特点?"})
print(f"\n客服助手回答: {response['output']}\n")
print("--- 场景二:查询订单状态 ---")
# User asks for order status, Agent should choose the CheckOrderStatus tool
response = agent_executor.invoke({"input": "我的订单号 1002 现在是什么状态?"})
print(f"\n客服助手回答: {response['output']}\n")
print("--- 场景三:查询知识库 ---")
# User asks a general question, Agent should choose the SearchKnowledgeBase tool
response = agent_executor.invoke({"input": "我的手机Wi-Fi连接有问题怎么办?"})
print(f"\n客服助手回答: {response['output']}\n")
print("--- 场景四:不确定或无工具处理 ---")
# User asks a question the Agent has no tool for, the Agent should answer directly
response = agent_executor.invoke({"input": "请帮我写一首关于秋天的诗。"})
print(f"\n客服助手回答: {response['output']}\n")
print("--- 场景五:多步推理 (如果工具设计允许) ---")
# This is a more complex example if the Agent can understand and break down tasks.
# A basic ReAct Agent might not handle complex multi-step reasoning perfectly, but advanced Agents can.
# Suppose the user asks: "Has my order 1001 shipped? If so, what are your latest earbud recommendations?"
# A basic ReAct Agent might prioritize one clear tool call or state it can't handle unrelated tasks simultaneously.
# More advanced Agents (like OpenAI Functions Agent) or planning Agents handle this better.
# Here we demonstrate a slightly more complex, but still single-tool dominant scenario.
response = agent_executor.invoke({"input": "我的订单号1003已经签收了,我想了解一下你们的退换货政策。"})
print(f"\n客服助手回答: {response['output']}\n")
Code Walkthrough:
- Load Environment Variables: Ensure your
OPENAI_API_KEYis configured in your.envfile or set directly as an environment variable. - Define Tools:
- We created three Python functions:
search_product_catalog,check_order_status, andsearch_knowledge_base. These simulate interactions with external systems (product database, order system, knowledge base). - Crucial Point: Each function is wrapped in a
langchain.tools.Toolobject. Thenameattribute is the unique identifier, andfuncis the actual function to execute. The absolute most important part is thedescriptionattribute! This description is fed directly to the LLM. The LLM relies entirely on this text to understand what the tool does and when to use it. Therefore, thedescriptionmust be clear, accurate, and include usage examples.
- We created three Python functions:
- Initialize the LLM: We use
OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0)as the Agent's brain. Settingtemperature=0makes the LLM's output more deterministic and less creative, which is much better for an Agent that needs to make precise decisions. - Fetch the ReAct Prompt Template:
hub.pull("hwchase17/react")downloads a standard ReAct prompt template from the LangChain Hub. This template instructs the LLM on how to format itsThoughtandActionoutputs. - Create the Agent:
create_react_agent(llm, tools, prompt)is a factory function that combines the LLM, our list of tools, and the ReAct prompt template to create a ReAct-style Agent object. - Create the AgentExecutor:
AgentExecutoris the engine that runs the Agent.agent=agent: Specifies the Agent to execute.tools=tools: Passes the tool list again so the executor can actually call them.verbose=True: Take note! This is your ultimate debugging weapon. It prints out everyThought,Action,Action Input, andObservationstep by step, giving you a crystal-clear view of the Agent's thought process.handle_parsing_errors=True: Allows the AgentExecutor to attempt to recover from parsing errors caused by incorrectly formatted LLM outputs, improving robustness.
- Run the Agent:
agent_executor.invoke({"input": "..."})kicks off the Agent's reasoning and execution loop. You will see how the Agent autonomously selects the right tool based on your question, executes it, and delivers the final answer.
By running the code above, you'll clearly see how the Agent autonomously decides which tool to call based on the user's input and ultimately provides an answer. The output generated by verbose=True will give you a highly intuitive understanding of the Agent's internal mechanics.
📝 Pitfalls and Best Practices
Agents are fantastic, but they aren't magic bullets. You'll inevitably run into a few "gotchas" along the way. As your instructor, it's my duty to warn you about these pitfalls and teach you how to avoid them.
1. The Art of Tool Descriptions: The Agent's "Instruction Manual"
- The Pitfall: Tool descriptions are vague or don't match the actual functionality. For example, if your
check_order_statusdescription just says "check orders" without specifying that the input must be an order number, the LLM might pass it a string like "where is my package", causing the tool call to fail. - How to Avoid It:
- Be as rigorous as writing API docs: Keep it clear, accurate, and concise.
- Specify inputs and outputs: Explicitly tell the LLM what type and format of input the tool expects, and roughly what it will output.
- Include examples: Provide one or two concrete examples to guide the LLM on how to use it.
- Differentiate similar tools: If you have multiple tools with similar functions, highlight their differences in the descriptions so the LLM knows exactly when to pick which.
2. Token Limits and Context Management: The Agent's "Memory Capacity"
- The Pitfall: In every loop, the Agent appends the historical
Thought,Action,Action Input, andObservationto the LLM's context. If the Agent goes through many reasoning steps, or if the conversation is long, the context bloats rapidly. It will quickly hit the LLM's token limit, causing reasoning to abort or costs to skyrocket. - How to Avoid It:
- Cap the maximum steps (
max_iterations): Set themax_iterationsparameter inAgentExecutorto prevent infinite loops or excessive steps. - Memory Management: For long conversations, integrate LangChain's
Memorymodules. For instance, useConversationBufferMemoryto store chat history, or better yet, useConversationSummaryMemoryto summarize past interactions and save tokens. - Prompt Engineering: Add instructions in the Agent's prompt to guide the LLM to provide a final answer when appropriate, rather than endlessly trying tools.
- Cap the maximum steps (
3. Hallucinations and Faulty Reasoning: The Agent's "Imagination"
- The Pitfall: The LLM might "hallucinate" non-existent tools, invent incorrect tool parameters, or go down a completely wrong reasoning path. This usually happens when the LLM misunderstands the question or when tool descriptions aren't clear enough.
- How to Avoid It:
verbose=Trueis your best friend: Always keepverbose=Trueon during development and carefully observe the LLM'sThoughtprocess. If its logic seems off, the problem is likely in your Prompt or tool descriptions.- Improve the Prompt: Optimize the Agent's Prompt with clear instructions to guide the LLM toward more rigorous thinking. For example, you can ask it to perform a confirmation step before calling a tool: "Which tool should I use, and why?"
- Precision in Tool Descriptions: Going back to point #1, ensure your tool descriptions are precise enough to leave no room for the LLM's imagination.
- Error Handling: Implement robust error handling inside your tool functions. Even if the LLM passes bad parameters, the tool should gracefully return an error message (which becomes an Observation) rather than crashing the app.
handle_parsing_errors=Truealso helps catch formatting issues in the LLM's output.
4. Tool Idempotency and Side Effects: The Agent's "Real-World Impact"
- The Pitfall: If your tool modifies external states (e.g., "update order status", "send email") and is not idempotent (meaning repeated calls yield different results), an Agent's mistake or repeated tool calls could cause negative real-world consequences.
- How to Avoid It:
- Design idempotent tools: Whenever possible, design your tools so that calling them multiple times produces the same result as calling them once.
- Human-in-the-loop confirmation: For tools with significant side effects, introduce a user confirmation step before the Agent executes the action, or build a secondary confirmation mechanism inside the tool itself.
- Permissions and Security: Strictly control the permissions of the tools the Agent can access to prevent it from executing sensitive operations it shouldn't touch.
5. Performance Optimization: The Agent's "Efficiency"
- The Pitfall: Every
ThoughtandActionloop involves a network call to the LLM and a tool execution, which introduces significant latency. If a user's question requires multi-step reasoning, the response time can become painfully slow. - How to Avoid It:
- Choose efficient LLMs: Use models that offer fast response times and lower token consumption.
- Optimize tools: Ensure your tool functions execute quickly, especially when they involve database queries or external API calls.
- Caching: Implement caching for tool calls that are frequently requested and whose results don't change often.