第 29 期 | 项目打磨:UI 集成与最终呈现 (EN)

⏱ Est. reading time: 20 min Updated on 5/7/2026

Subtitle: Breaking the QA Barrier and Giving AI the Power to Act

Hello, future AI architects! It's your AI tech mentor back again. Through our previous sessions, our intelligent support copilot has mastered basic Q&A and can retrieve information from a knowledge base, right? But do you feel like it's still missing a little "something"? That "something" is—the power to act.

User: "What is the delivery status of my order XYZ-123?" Copilot: "I don't know." — This is clearly not smart enough.

User: "Please help me update my registered phone number." Copilot: "Sure, please tell me your new number." And then? Nothing happens.

Traditional LLM Chains, whether direct Q&A or RAG, are like super-smart "librarians." They can find and summarize information, but they cannot "step out of the library" to get things done for you. A true "Intelligent Support Copilot" must not only answer questions but also solve problems.

Today, we are going to give our support copilot "wings," enabling it to think and act proactively. We will upgrade it from a "Q&A bot" to an Agent capable of "solving problems." This is absolutely one of the most exciting parts of LangChain and a mandatory step in building production-grade AI applications!

🎯 Learning Objectives for This Session

  1. Understand the Core Principles of Agents: Grasp how Agents leverage the reasoning capabilities of LLMs, combined with external Tools, to plan and execute tasks, breaking through the limitations of traditional Q&A Chains.
  2. Master the Definition and Integration of Tools: Learn how to create and register custom tools for an Agent, allowing the AI to interact with external systems (like databases, APIs, or CRMs).
  3. Implement Agent Features in the Support Project: Empower our support copilot with the ability to proactively query orders, update user profiles, and find store locations, improving its efficiency in solving real-world problems.
  4. Identify Agent Use Cases and Potential Pitfalls: Understand the pros, cons, and common traps of using Agents in real-world applications, laying the groundwork for designing more complex AI systems in the future.

📖 Deep Dive into the Concepts

Agent: The "Brain" and "Hands" of AI

Imagine you are a customer service representative receiving a request about an "order inquiry." What would you do?

  1. Understand the problem: The user wants to check their order status.
  2. Plan the action: I need an order ID. If the user provided it, I'll use it. If not, I'll ask for it.
  3. Select the tool: I need to access the "Order Management System" or "Logistics Tracking System."
  4. Execute the tool: Call the system API and input the order ID.
  5. Get the result: The system returns the order status.
  6. Summarize and reply: Tell the user the status in a friendly manner.

This process embodies the core philosophy of an Agent: The LLM is no longer just generating text; it becomes a decision-maker and planner, orchestrating a series of "tools" to complete a task.

In LangChain:

  • Agent: The core controller. It uses a powerful LLM as its "brain" to decide what to "think" about next or what "action" to take based on the user's input and the current state.
  • Tools: The "hands" of the Agent. These are functions or API endpoints that encapsulate specific capabilities. For example, a tool to query a database, send an email, or call a weather API. The Agent interacts with the outside world by calling these tools to gather information or execute operations.
  • Agent Executor: The "heart" of the Agent. It drives the entire "thought-action" loop. The Agent Executor continuously passes the user's input, the LLM's thoughts (decisions), and the tool's outputs back to the Agent until the task is completed or a stopping condition is met.

The working mode of an Agent typically follows the ReAct (Reasoning and Acting) framework:

  1. Observation: The Agent receives user input and the execution results of tools.
  2. Thought: The LLM reasons based on the observed information and decides the next action or direction of thought.
  3. Action: The LLM decides which tool to call and what parameters are needed.
  4. Action Input: The LLM provides the specific input for the selected tool.
  5. Observation (New): After the tool executes, it returns the result to the Agent, forming a new observation.

This loop continues until the LLM decides it can generate the final answer and stops.

Mermaid Diagram: Agent Workflow

graph TD
    A[User Request] --> B{Agent Executor Starts};
    B --> C{Agent (LLM) Thinks};
    C -- "Thought: I need to check the order status, I should use the order tracking tool." --> D{Select Tool};
    D -- "Action: search_order_status" --> E[Execute Tool (Tools)];
    E -- "Action Input: order_id=XYZ-123" --> F[External System (Database/API)];
    F --> G[Tool Output];
    G -- "Observation: Order XYZ-123 status is 'Out for delivery'." --> C;
    C -- "Thought: I have the order status, now I can reply to the user." --> H{Agent (LLM) Final Response};
    H --> I[Return to User];

This flowchart clearly illustrates how an Agent solves a user's problem step-by-step through the "Think-Act-Observe" loop. The LLM is no longer a simple Q&A machine, but an intelligent entity that can proactively plan and invoke external resources. For our intelligent support copilot, this means it can transition from a "passive responder" to a "proactive problem solver."

💻 Hands-on Code Walkthrough

Now, let's integrate the powerful capabilities of an Agent into our intelligent support project. We will create two custom tools:

  1. OrderTrackerTool: Simulates querying a user's order status.
  2. StoreLocatorTool: Simulates finding the nearest store address.

Then, we will create an Agent that can intelligently select and use these tools based on the user's questions.

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent, Tool
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableSequence
from langchain_core.messages import HumanMessage, AIMessage

# Load environment variables (ensure OPENAI_API_KEY is in your .env file)
load_dotenv()

# --- 1. Define exclusive tools for our Support Copilot ---

# Tool 1: Order Tracking Tool
def search_order_status(order_id: str) -> str:
    """
    Simulates a tool to query order status.
    In a real scenario, this would call your e-commerce backend API or database.
    """
    print(f"\n--- Simulating order tracking tool call, querying order ID: {order_id} ---")
    if order_id == "XYZ-123":
        return "Order XYZ-123 is currently out for delivery and is expected to arrive this afternoon."
    elif order_id == "ABC-456":
        return "Order ABC-456 has been signed for. Thank you for your purchase!"
    else:
        return f"Sorry, we couldn't find any information for order ID {order_id}. Please check and try again."

# Tool 2: Store Locator Tool
def find_nearest_store(location: str) -> str:
    """
    Simulates a tool to find the nearest store.
    In a real scenario, this would call a map API or store database.
    """
    print(f"\n--- Simulating store locator tool call, querying location: {location} ---")
    if "Shanghai" in location or "上海" in location:
        return "123 Caobao Road, Xuhui District, Shanghai. Phone: 021-88889999"
    elif "Beijing" in location or "北京" in location:
        return "456 Dawang Road, Chaoyang District, Beijing. Phone: 010-66667777"
    else:
        return f"Sorry, we currently do not have any direct-sale stores near {location}."

# Wrap functions into LangChain Tool objects
tools = [
    Tool(
        name="OrderTracker", # Tool name, the Agent will use this to call it
        func=search_order_status, # The actual function to execute
        description="Useful for getting the real-time delivery status of customer orders. Input must be an order ID, e.g., 'XYZ-123'."
    ),
    Tool(
        name="StoreLocator", # Tool name
        func=find_nearest_store, # The actual function to execute
        description="Useful for finding the nearest store's address and contact information. Input must be a city or specific location, e.g., 'Shanghai' or 'Beijing Chaoyang District'."
    )
]

# --- 2. Initialize LLM ---
# Use ChatOpenAI as the Agent's "brain"
llm = ChatOpenAI(model="gpt-4o", temperature=0) # Recommended to use a more powerful model like gpt-4o or gpt-3.5-turbo

# --- 3. Create the Agent's Prompt ---
# LangChain provides built-in Agent types and Prompt templates.
# Here we use create_react_agent, which is based on the ReAct framework.
# A ReAct Agent requires a Prompt containing {tools}, {tool_names}, {input}, and {agent_scratchpad}.
# {agent_scratchpad} is where the Agent records its thought process and tool execution results.
prompt_template = PromptTemplate.from_template("""
You are an intelligent customer support copilot that helps users query order status and store information.
Based on the user's question, use the appropriate tools to query and provide a friendly response.
If you need more information to use a tool, proactively ask the user.

Available tools:
{tools}

Please use the following format for your response:

Question: The user's input question
Thought: What should I think about to answer this question?
Action: The tool I should execute (only if a tool is needed)
Action Input: The input to pass to the tool (only if a tool is needed)
Observation: The output result of the tool (this is returned by the tool, you do not need to generate it)
... (This Thought/Action/Action Input/Observation loop can repeat multiple times)
Thought: I now know the final answer
Final Answer: The final answer to the user

Begin!
Question: {input}
{agent_scratchpad}
""")

# --- 4. Create the Agent ---
# create_react_agent builds an Agent based on the LLM, tools, and prompt_template
agent = create_react_agent(llm, tools, prompt_template)

# --- 5. Create the Agent Executor ---
# Agent Executor is responsible for running the Agent and handling the Thought/Action/Observation loop
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) # verbose=True allows you to see the Agent's thought process

# --- 6. Run the Agent for Hands-on Practice ---
print("\n--- Support Copilot Agent Started! ---")

# Scenario 1: Query a known order ID
print("\n--- Scenario 1: Query a known order ID ---")
response1 = agent_executor.invoke({"input": "What is the delivery status of my order XYZ-123?"})
print(f"\nSupport Copilot Response: {response1['output']}")

# Scenario 2: Query an unknown order ID
print("\n--- Scenario 2: Query an unknown order ID ---")
response2 = agent_executor.invoke({"input": "Help me check the status of order 999-888."})
print(f"\nSupport Copilot Response: {response2['output']}")

# Scenario 3: Query store information
print("\n--- Scenario 3: Query store information ---")
response3 = agent_executor.invoke({"input": "Where is the nearest store in Shanghai?"})
print(f"\nSupport Copilot Response: {response3['output']}")

# Scenario 4: Query an area without a store
print("\n--- Scenario 4: Query an area without a store ---")
response4 = agent_executor.invoke({"input": "I want to know the store address in Shenzhen."})
print(f"\nSupport Copilot Response: {response4['output']}")

# Scenario 5: Agent needs to proactively ask for more information
print("\n--- Scenario 5: Agent needs to proactively ask for more information ---")
response5 = agent_executor.invoke({"input": "I want to check an order."})
print(f"\nSupport Copilot Response: {response5['output']}")

# Scenario 6: General question not requiring tools
print("\n--- Scenario 6: General question not requiring tools ---")
response6 = agent_executor.invoke({"input": "Hello, what are your working hours?"})
print(f"\nSupport Copilot Response: {response6['output']}")

# Scenario 7: A more complex request (Agent might think and call tools multiple times)
print("\n--- Scenario 7: A more complex request ---")
response7 = agent_executor.invoke({"input": "I have an order ABC-456, and I also want to know the store address in Beijing."})
print(f"\nSupport Copilot Response: {response7['output']}")

Code Breakdown:

  1. Define Tools (Tool): By wrapping the Python functions search_order_status and find_nearest_store, we created two Tool objects. Each Tool requires a name (for the Agent to identify it) and a description (crucial, as the LLM uses this description to determine when to use the tool).
  2. Initialize LLM: ChatOpenAI acts as the Agent's "brain," responsible for understanding user intent and planning actions.
  3. Create Agent Prompt: We used PromptTemplate to define the Agent's behavioral guidelines. This Prompt is critical; it tells the LLM how to think, when to call tools, and how to formulate the final answer. {agent_scratchpad} is a placeholder where the Agent internally logs its thought process and tool outputs.
  4. Create Agent (create_react_agent): LangChain offers multiple ways to create Agents. create_react_agent is a very common and powerful factory function based on the ReAct framework, giving the Agent strong reasoning capabilities.
  5. Create Agent Executor (AgentExecutor): This is the entry point that actually runs the Agent. The verbose=True parameter is incredibly useful—it prints the Agent's thought process (Thought) and tool invocations (Action/Observation), helping us understand its decision-making logic. It's a must-have for debugging!
  6. Run the Agent: By passing the user's question via agent_executor.invoke({"input": "..."}), the Agent begins its "Think-Act" loop and eventually returns the result.

Run the code above, and you will see how the Agent intelligently selects and executes the OrderTracker or StoreLocator tools based on your questions, ultimately providing the expected answers. Pay special attention to the detailed logs printed by verbose=True; they give you a clear view of the Agent's every thought and action.

Pitfalls and How to Avoid Them

While Agents are powerful, there are quite a few "pitfalls" in real-world applications. As a future architect, you must anticipate and prepare for them:

  1. The "Blurry Tool Description" Trap

    • The Pitfall: If your tool's description is vague, inaccurate, or overlaps with other tools, the LLM might misjudge, leading it to select the wrong tool or not know which one to use at all.
    • How to Avoid: The tool description is the Agent's "instruction manual." Treat it like writing API documentation—strive to be concise, clear, and unambiguous. Explicitly state the tool's function, the expected input parameter types, and the meaning of the output. Make the LLM feel that this tool was born to solve a specific problem.
  2. The "Prompt as a Manual" Challenge

    • The Pitfall: An Agent's behavior heavily relies on its main Prompt. A poorly designed Prompt can cause the Agent to get stuck in infinite loops, call tools randomly, or answer directly when a tool is actually needed.
    • How to Avoid: The Agent's Prompt is like an "operations manual" for a new employee. It needs to include:
      • Clear Role Definition: Who are you? (An intelligent support copilot)
      • Clear Task Objectives: What are you supposed to do? (Help users solve problems)
      • Clear Tool Usage Rules: When and how should tools be used?
      • Clear Response Format: Ensure the Agent knows how to structure the final answer.
      • Instructions for Uncertainty: What to do if information is missing? (Proactively ask)
    • Experiment with different Prompt phrasings, observe the Agent's behavior, and iteratively optimize.
  3. The "Fragile Tool" Problem

    • The Pitfall: The tools an Agent calls might fail (network errors, API rate limits, database exceptions). If a tool fails, the Agent might get stuck or return a meaningless error message.
    • How to Avoid:
      • Robust Internal Error Handling: Your func must catch all possible exceptions internally and return a friendly, LLM-comprehensible error message (e.g., "Failed to query the order, please try again later.").
      • Error Handling Instructions in the Prompt: Instruct the Agent on how to react when it receives an error message from a tool (e.g., explain the error to the user, suggest retrying, or transfer to a human agent).
  4. The "Hidden Cost" of Intelligence

    • The Pitfall: In Agent mode, the LLM might go through multiple "Think-Act" loops. This means a single user request could trigger multiple LLM API calls and tool invocations, significantly increasing API costs and response latency.
    • How to Avoid:
      • Optimize Prompts: Try to guide the Agent to make the right decision in a single thought process, reducing unnecessary loops.
      • Streamline Tools: Only provide the tools the Agent truly needs. Avoid a bloated tool library that makes selection harder for the LLM.
      • Caching Strategies: Consider implementing caching for tools that are queried frequently and have relatively static results.
      • Model Selection: While ensuring performance, prioritize faster and more cost-effective LLM models. For instance, gpt-3.5-turbo can handle many simple tool selections perfectly well.
  5. The "Rogue Agent" Risk

    • The Pitfall: Agents gain the ability to call external systems. If poorly designed, this can lead to security vulnerabilities (e.g., an Agent calling a sensitive data deletion API or accessing unauthorized user information).
    • How to Avoid:
      • Principle of Least Privilege: The backend APIs provided to the Agent should only have the minimum permissions required to complete the task.
      • Strict Input Validation: The tool functions must strictly validate the legality of all input parameters to prevent injection attacks or malicious inputs.
      • Auditing and Monitoring: Implement detailed logging and monitoring for all tool invocations and operations performed by the Agent to quickly detect anomalous behavior.
      • Human-in-the-Loop: For high-risk operations (like modifying critical data), introduce a human review mechanism.

📝 Session Summary

Congratulations! In this session, we successfully gave our intelligent support copilot the "power to act," upgrading it from a passive Q&A machine to an Agent capable of proactively solving problems. We deeply explored the core principles of Agents, the ReAct framework, and demonstrated through hands-on code how to define tools, create an Agent, and put it to work in a customer service scenario.

You should now have mastered:

  • How Agents leverage LLM reasoning and external tools to execute complex tasks.
  • How to create custom Tool objects and integrate them into an Agent.
  • The role of the AgentExecutor and the importance of the verbose parameter in debugging.

We also discussed the various "pitfalls" you might encounter when applying Agents in the real world and provided a detailed guide on how to avoid them. Remember, the power of Agents comes with complexity, requiring you to invest more thought into tool design, Prompt engineering, error handling, and security.

Agents are the key to building truly intelligent and useful AI applications. In our next session, we will dive deeper and explore how to add "Memory" to our Agent, allowing it to maintain context across multi-turn conversations and serve users even better! Stay tuned!