第 17 期 | Agent Executor 与调试：拯救跑偏的 Agent (EN) — LangChain Masterclass: Zero to Production AI Applications

Subtitle: Empower your intelligent support copilot with autonomous thinking and action—no more simple chat parrots!

Hey, developers! It's your old friend and AI tech mentor. Today, we're going to build something huge!

In our previous sessions, we laid a rock-solid foundation for LangChain. LLMs, PromptTemplates, OutputParsers, and Chains are the building blocks of intelligent applications. But have you noticed that while our support copilot can understand questions and generate responses, it feels like it's missing a "soul"? It can chat and answer questions based on existing text knowledge, but what if a user asks: "What is my order status?" or "Can you check the latest product manual for the return policy?" It would be completely stumped, right?

Exactly. It lacks "hands and feet"—the ability to execute external actions. More importantly, it lacks a "brain"—the ability to dynamically make decisions based on the question and choose the right "hands and feet" to solve the problem.

Today, we are going to equip our support copilot with a real brain and limbs, evolving it from a "chat parrot" into an "intelligent agent" capable of autonomous thinking and calling external tools to solve complex problems!

🎯 Learning Objectives

Thoroughly understand the core principles and mechanisms of LangChain Agents: Figure out exactly how Agents make LLMs "smart" and not just simple text generators.
Master configuring and using different Tools for Agents: Learn how to design and wrap external functions so the Agent can call them to fetch information or execute operations.
Learn to build an intelligent support Agent capable of autonomous decision-making and tool calling to solve complex problems: Get hands-on and build a support Agent that can check orders and search knowledge bases.
Enhance the practicality of the support copilot, enabling it to handle questions beyond its own knowledge scope: Give our copilot the ability to solve real-world production issues.

📖 Core Concepts

What is a LangChain Agent?

Simply put, a LangChain Agent is an "intelligent decision-maker" driven by a Large Language Model (LLM) that can autonomously plan a series of actions based on user input. These actions might include: thinking, calling external tools, observing the tool's execution results, and looping through this process until a final answer is reached.

Imagine the LLM as a super-smart brain that lacks the ability to act. The introduction of an Agent equips this brain with various "hands and feet" (Tools) and teaches it how to select and use them (Action) based on its reasoning (Thought). It then learns from the feedback of these actions (Observation) to ultimately achieve its goal.

The core value of an Agent lies in giving the LLM real-time information retrieval and execution capabilities that extend far beyond its training data and context window. Our support copilot is no longer an "information parrot" confined to a pre-set knowledge base; it becomes a "problem-solving expert" that can genuinely connect to enterprise backends and call various APIs.

Agent vs. Chain: Dynamic Decision-Making vs. Pre-set Paths

We previously learned about Chains. A Chain is like a pre-laid assembly line where data flows step-by-step along a fixed path. For example, an LLMChain followed by an OutputParser. This route is static.

Agents are different; they have no fixed execution path. Every step is a dynamic decision. When an Agent receives input, it will:

Thought: Based on the current input and conversation history, the LLM thinks about what to do next. Should it answer directly? Does it need to call a tool? If so, which tool? What are the parameters?
Action: Based on its thought process, the Agent selects an appropriate tool and provides the required input.
Observation: After the tool executes, the Agent retrieves the tool's output.
Loop: The observation is fed back to the LLM, which thinks again and decides the next action until the problem is solved or a pre-set stop condition is met.

This looping process is the secret behind an Agent's ability to handle complex, multi-step tasks.

Core Components of an Agent

LLM (Language Model): The "brain" of the Agent, responsible for thinking, reasoning, and generating action plans. Its level of intelligence directly determines the Agent's ceiling.
Tools: The "hands and feet" of the Agent. They wrap external functionalities, which can be API calls, database queries, file read/writes, or even calls to other models. Each tool has a description, which the Agent relies on to decide when and how to use it.
Agent Executor: The "coordinator" or "dispatcher" of the Agent. It receives thought and action instructions from the LLM, executes the corresponding tools, and returns the tool's output back to the LLM. It manages the entire Thought-Action-Observation loop.

The ReAct Paradigm: The Art of Agent Thinking

In LangChain, one of the most common and powerful Agent paradigms is ReAct (Reasoning and Acting). The core idea of ReAct is to enable the LLM not only to "Act" but also to "Reason". Its workflow is highly intuitive:

Thought: What should I do? What information do I need? Which tool should I use?
Action: Okay, I've decided to use this tool.
Action Input: Here are the parameters the tool needs.
Observation: After executing the tool, I received this result.
... (Loop)
Final Answer: Based on all the information, here is the final answer.

Through this explicit reasoning process, the LLM can better understand tasks, plan steps, and correct errors, thereby increasing the success rate of problem-solving.

Mermaid Diagram: Agent Workflow

Alright, enough theory—a picture is worth a thousand words! Let's look at the core workflow of an Agent:

graph TD
    A[User Input/Question] --> B{Agent Executor};
    B -- Passes input and chat history to LLM --> C[LLM Brain];
    C -- Thinks based on input and available tool descriptions --> D[Thought];
    D -- Decides which tool to use and its parameters --> E[Action];
    E -- Provides required parameters for the tool --> F[Action Input];
    F --> G{Tool Library};
    G -- Executes the selected tool --> H[Observation];
    H -- Feeds tool output back to LLM --> C;
    C -- Continues thinking until a final answer is reached --> I[Final Answer];
    I --> J[Returns to User];
    H -- If issue is unresolved, continues loop --> D;

    subgraph Agent Execution Loop
        D --> E --> F --> G --> H
    end

Diagram Explanation:

User Input: The user asks a question, e.g., "What is the status of my order XYZ?"
Agent Executor Receives: The executor takes the question.
LLM Thinks: The executor passes the question and the list of available tools (along with their descriptions) to the LLM. The LLM starts "thinking" based on the ReAct paradigm: "Based on the question, I need to call an order lookup tool."
Thought/Action/Action Input: The LLM decides to call OrderLookupTool with the input XYZ.
Tool Library Executes: The Agent Executor finds OrderLookupTool and passes XYZ to it.
Observation: OrderLookupTool returns "Order XYZ has shipped and is expected to arrive tomorrow."
LLM Thinks Again: The LLM receives this result, realizes the question is resolved, and knows it can provide an answer directly.
Final Answer: The LLM generates the final response: "Your order XYZ has shipped and is expected to arrive tomorrow."
Returns to User: The Agent Executor returns the final answer to the user.

If the LLM realizes it needs more information (e.g., the order number is invalid), it will think again. It might suggest the user provide the correct order number or call a tool to search the user's order history. This is the true power of an Agent!

💻 Hands-On Code Drill (Application in the Support Copilot Project)

Now that we understand the theory, it's time to roll up our sleeves and apply these concepts to our intelligent support copilot!

Scenario Setup: Our support copilot now needs to handle two advanced types of requests:

Query User Order Status: The user provides an order ID, and the copilot needs to query the internal order system.
Search External Knowledge Base: The user asks specific product questions, and the copilot needs to search for answers in a simulated external knowledge base.

To achieve this, we need to create two "Tools" and arm our Agent with them.

We will use Python for this demonstration, as it is more popular and mature within the LangChain community.

1. Prepare the Environment and LLM

First, ensure you have installed the necessary libraries and configured your OpenAI API Key (or a key from another LLM provider).

pip install langchain openai faiss-cpu tiktoken

import os
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain.tools import Tool
from langchain import hub
from langchain_core.prompts import PromptTemplate
from typing import List, Dict, Any

# Set your OpenAI API Key
# It is recommended to set this via environment variables; hardcoded here for demonstration purposes
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# Initialize the LLM
# We use ChatOpenAI because it excels at agentic decision-making
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

print("LLM initialization complete!")

2. Define Our Tools

This is the "hands and feet" section of the Agent. Each tool requires a name (a unique identifier for the tool), a description (crucial for the Agent to decide when to use the tool), and a func (the actual function to execute).

# Mock an order lookup system
def get_order_status(order_id: str) -> str:
    """
    Query the current status of a specific order ID.
    The input parameter is the order ID (order_id).
    """
    print(f"\n--- Looking up order {order_id} ---")
    mock_orders = {
        "ORDER_12345": "Your order ORDER_12345 has shipped and is expected to arrive tomorrow.",
        "ORDER_67890": "Your order ORDER_67890 is being packed and is expected to ship within 3 days.",
        "ORDER_ABCDE": "Your order ORDER_ABCDE has been refunded. The refund will arrive in 3-5 business days.",
        "ORDER_TEST1": "Order ORDER_TEST1 has an exception. Please contact support.",
    }
    status = mock_orders.get(order_id.upper(), f"No information found for order ID {order_id}. Please check if the order ID is correct.")
    print(f"--- Order lookup result: {status} ---")
    return status

# Mock an external knowledge base search function
def search_knowledge_base(query: str) -> str:
    """
    Search for relevant information in the support knowledge base.
    The input parameter is the search query (query).
    Suitable for finding product features, user guides, FAQs, etc.
    """
    print(f"\n--- Searching knowledge base for: '{query}' ---")
    mock_kb_articles = {
        "return policy": "Our return policy allows unconditional returns within 30 days of purchase, provided the item is in good condition.",
        "product warranty": "All electronic products come with a one-year free warranty. Human-caused damage is not covered.",
        "shipping time": "Orders are typically processed and shipped within 24 hours. Domestic delivery takes 3-5 business days.",
        "account security": "Please change your password regularly and do not share your account information. We will never ask for your password via email.",
        "payment methods": "We support Alipay, WeChat Pay, UnionPay, and Visa credit cards.",
        "contact support": "You can contact us via live chat, by calling our hotline at 400-123-4567, or by emailing [email protected].",
    }
    
    # Simple keyword matching search
    results = [
        article for topic, article in mock_kb_articles.items()
        if query.lower() in topic.lower() or query.lower() in article.lower()
    ]
    
    if results:
        # For demonstration, return only the first matched result
        print(f"--- Knowledge base search result: {results[0]} ---")
        return results[0]
    else:
        print("--- Knowledge base search result: No relevant information found. ---")
        return "No knowledge base information found relevant to your query. Please try other keywords or contact a human agent."

# Wrap functions into LangChain Tool objects
tools = [
    Tool(
        name="GetOrderStatus",
        func=get_order_status,
        description="""
        Use this tool when you need to query the current status of a user's order.
        The input must be an order ID, for example, 'ORDER_12345'.
        Example: When a user asks "What is my order status? The order ID is ORDER_12345", you should call this tool and pass in 'ORDER_12345'.
        """
    ),
    Tool(
        name="SearchKnowledgeBase",
        func=search_knowledge_base,
        description="""
        Use this tool when you need to search a support knowledge base for information about product features, user guides, FAQs, policy terms, etc.
        The input must be the keyword(s) of the search query, for example, 'return policy' or 'product warranty period'.
        Example: When a user asks "What is your return policy?", you should call this tool and pass in 'return policy'.
        """
    )
]

print("Tools definition complete!")

Key Takeaway: The tool's description is the key to the Agent's success! The LLM relies entirely on this description to determine which tool is best suited for the current problem. Therefore, the description must be clear, accurate, and include the tool's purpose, input requirements, and applicable scenarios. I specifically added detailed descriptions and usage examples here to help the Agent understand better.

3. Build the Agent

We use the create_react_agent function to create an Agent based on the ReAct paradigm. This function requires three core parameters:

llm: Our brain.
tools: Our defined hands and feet.
prompt: The Agent's instructions, telling the LLM how to think and act. LangChain provides a default ReAct prompt template that we can pull directly from langchain.hub.

# Fetch the ReAct prompt template from LangChain Hub
# This template contains the core instructions for the ReAct paradigm, guiding the LLM on how to think and act
prompt = hub.pull("hwchase17/react")

# Print the prompt template to see what it looks like
print("\n--- ReAct Agent Prompt Template ---")
# print(prompt.template) # Print the full template content; uncomment to view

# Create the ReAct Agent
# create_react_agent combines the LLM, Tools, and Prompt to generate a Runnable that the Agent can call
agent = create_react_agent(llm, tools, prompt)

# Create the Agent Executor
# The Agent Executor is responsible for driving the Agent's execution loop
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)

print("Agent build complete!")

Setting verbose=True is incredibly important! It forces the Agent to print out every step of its Thought, Action, Action Input, and Observation during execution. This is invaluable for understanding the Agent's decision-making process and for debugging. Setting handle_parsing_errors=True ensures that if the Agent fails to parse an output, the error is returned to the Agent as an Observation, giving it a chance to self-correct.

4. Run the Agent: Support Copilot Live Drill

Now that our intelligent support copilot is fully armed, it's time to see how it performs!

print("\n--- Intelligent Support Copilot started! Start asking questions! ---")

# Scenario 1: Query order status
question1 = "My order ID is ORDER_12345. What is its current status?"
print(f"\nUser: {question1}")
response1 = agent_executor.invoke({"input": question1})
print(f"Support Copilot: {response1['output']}")
# Expected: Agent calls the GetOrderStatus tool and returns the order status

print("\n" + "="*50 + "\n")

# Scenario 2: Search knowledge base - Return policy
question2 = "What is your return policy?"
print(f"\nUser: {question2}")
response2 = agent_executor.invoke({"input": question2})
print(f"Support Copilot: {response2['output']}")
# Expected: Agent calls the SearchKnowledgeBase tool and returns the return policy

print("\n" + "="*50 + "\n")

# Scenario 3: Search knowledge base - Non-existent query
question3 = "Do you have any materials on the latest progress of your AI chips?"
print(f"\nUser: {question3}")
response3 = agent_executor.invoke({"input": question3})
print(f"Support Copilot: {response3['output']}")
# Expected: Agent calls the SearchKnowledgeBase tool but returns no relevant information found

print("\n" + "="*50 + "\n")

# Scenario 4: Mixed question - Needs a tool and a simple answer
question4 = "What is the status of my order ORDER_67890? Also, what is your customer service phone number?"
print(f"\nUser: {question4}")
response4 = agent_executor.invoke({"input": question4})
print(f"Support Copilot: {response4['output']}")
# Expected: Agent first queries the order, then likely calls SearchKnowledgeBase again to find the phone number

print("\n" + "="*50 + "\n")

# Scenario 5: Question not requiring tools
question5 = "Hello, how is the weather lately?"
print(f"\nUser: {question5}")
response5 = agent_executor.invoke({"input": question5})
print(f"Support Copilot: {response5['output']}")
# Expected: Agent does not use any tools and answers directly

Run the code above, and you will see the Agent's detailed thought process (Thought), the tool it selects (Action), the tool's input (Action Input), and the tool's output (Observation), culminating in a natural response. It's like watching the LLM "think" and "act" right before your eyes!

Through this hands-on drill, our intelligent support copilot is no longer just a decorative chatbot; it possesses the real ability to solve actual business problems! This is the cornerstone of production-grade AI applications.

Pitfalls & Best Practices

Agents are powerful, but they aren't silver bullets. You will encounter some "gotchas" along the way. As a veteran, let me give you a heads-up on how to avoid them.

The "Black Box" and "Magic" of Tool Descriptions:
- Pitfall: The Agent's decision on which tool to use relies entirely on the tool's description. If the description is unclear, inaccurate, or ambiguous, the Agent might choose the wrong tool or fail to choose the right one.
- Best Practice: Treat the tool's description as a mini-prompt for the LLM. It should be:
  - Clear and explicit: Directly state what the tool does.
  - Include input examples: Tell the LLM exactly what kind of input the tool expects.
  - Explain applicable scenarios: Help the LLM make the best choice among multiple tools.
  - Iterate and optimize: Monitor the verbose=True output. If the Agent makes a wrong decision, immediately check and refine the tool description.
Hallucination and Tool Outputs:
- Pitfall: The LLM might "embellish" or even "fabricate" the tool's output results. Or, if a tool returns "not found," the LLM might "invent" an answer itself.
- Best Practice:
  - Trust tool outputs: Emphasize in the Agent's prompt that it must generate the final answer strictly based on the tool's Observation.
  - Explicit error handling: Ensure tools return a clear "not found" message when they fail to find results, rather than an empty string or vague prompt.
  - Post-processing: If you don't fully trust the LLM's final response, you can implement simple keyword checks or format validation after the Agent outputs its answer.
Agent Cost and Efficiency:
- Pitfall: Every Thought, Action, and Observation loop means one or more LLM API calls. Complex problems can trigger multiple loops, significantly increasing API costs and response times.
- Best Practice:
  - Streamline tools: Only provide the tools the Agent genuinely needs to avoid unnecessary distractions during decision-making.
  - Optimize tool efficiency: Ensure your tool functions execute quickly to avoid long wait times.
  - Set maximum iterations: Use the max_iterations parameter in AgentExecutor to limit the number of thinking rounds, preventing infinite loops or excessive costs.
  - Caching strategies: Consider introducing caching for frequently queried tool results.
Security and Permission Management:
- Pitfall: Because Agents can call external tools, they have the power to execute external operations. Poorly designed tools or Agent decision errors could lead to sensitive data leaks or unintended system modifications.
- Best Practice:
  - Principle of least privilege: When tools access external systems, grant them only the minimum permissions required to complete the task.
  - Input validation: Tool functions must strictly validate and sanitize all input parameters internally to prevent injection attacks.
  - Auditing and monitoring: Log and monitor the Agent's tool-calling behavior to detect anomalies promptly.
  - Restrict sensitive operations: For sensitive actions like financial transactions or modifying user data, consider adding a human-in-the-loop approval step or stricter validation mechanisms.
Choosing the Right Agent Type:
- Pitfall: LangChain offers multiple Agent types (ReAct, OpenAI Functions, Structured Tool Agent, etc.). Choosing the wrong one can negatively impact performance.
- Best Practice:
  - ReAct: Highly versatile, suitable for scenarios requiring multi-step reasoning and complex decision-making. The downside is that the prompt is longer, which may increase token consumption.
  - OpenAI Functions Agent: If you are using OpenAI models, this is highly recommended. It leverages OpenAI's native function-calling capabilities, integrating tool selection and parameter extraction directly at the model level. It's more efficient, produces fewer hallucinations, and makes it easier to construct structured inputs. This is currently one of the most recommended Agents.
  - Structured Tool Agent: Extremely useful when your tools require structured inputs (like JSON objects).
  - Keep up with LangChain updates: The Agent module evolves rapidly, with new types and optimizations emerging constantly.

📝 Summary

Congratulations! By completing this session, you have mastered the core secrets of LangChain Agents and successfully equipped our intelligent support copilot with a "brain" and "limbs"!

We took a deep dive into how Agents work under the hood. They go beyond simple text generation; by combining the LLM's reasoning capabilities with external tools, they achieve dynamic decision-making and multi-step problem-solving. We also got hands-on experience defining tools, building an Agent, and empowering our support copilot to query orders and search knowledge bases.

The introduction of Agents elevates our support copilot from an "information parrot" to a production-grade application capable of "autonomous thinking and problem-solving." This is a critical step in building truly intelligent and useful AI applications.

But remember, with great power comes great responsibility. Agent design, tool descriptions, and security considerations all require meticulous crafting.

In our next session, we will dive even deeper and explore how to enable Agents to handle more complex conversations and memory. Stay tuned!