第 22 期 | 多模态输入：让客服机器人能够“看”图 (EN) — LangChain Masterclass: Zero to Production AI Applications

🎯 Learning Objectives for This Session

Hey there, future AI masters! Welcome to Part 01 of the LangChain Full-Stack Masterclass. I know you might be thrilled about the future of AI right now, but hold your horses—Rome wasn't built in a day, and neither is an intelligent support copilot. In this session, we'll dive straight into the core of our support system: its "brain" and "instructions." By the end of this lesson, you will:

Thoroughly understand the core role of LLMs in intelligent support: Learn how Large Language Models serve as the wellspring of intelligence for our support copilot.
Master basic LLM and ChatModel invocations in LangChain: Learn to converse with the most powerful models using the most concise methods.
Master the foundational art of Prompt Engineering: Learn how to "give orders" so the LLM acts exactly as intended, becoming a qualified customer service representative.
Proficiently use LangChain's PromptTemplate: Say goodbye to hardcoding and build flexible, reusable prompts using a templated approach.

📖 Concept Breakdown

Alright everyone, buckle up! We're about to dive deep into the "nerve center" of AI.

Imagine our "Intelligent Support Knowledge Base" project. The ultimate goal is to allow users to ask questions in natural language and receive professional, accurate, and fast answers. Who provides these answers? Who understands the users' weird and wonderful questions? The answer is—Large Language Models (LLMs).

LLMs are deep learning models trained on massive amounts of text data. They possess astonishing capabilities in text generation, comprehension, summarization, and translation. In our customer support scenario, the LLM is the "all-knowing" brain that can:

Understand user intent: When a user asks, "When will my order arrive?", it knows they are checking shipping status.
Generate natural language responses: It won't just throw a string of code at you; it answers in fluent, friendly language.
Handle complex scenarios: Even if a question is poorly phrased, it can infer the meaning based on context.

But having a brain isn't enough; you need to know how to "command" it. This brings us to our second protagonist—Prompts.

A Prompt is essentially an instruction or cue. It acts as a command given to the LLM, telling it "who you are," "what you need to do," and "how you should do it." A good prompt can unleash the LLM's full potential; a bad prompt might cause it to spout nonsense or answer completely off-topic.

In the world of LangChain, the interaction between LLMs and Prompts is elegantly abstracted and encapsulated. You don't need to worry about complex underlying model API calls. Through LangChain's interfaces, you can easily switch between different models (OpenAI's GPT series, Anthropic's Claude series, or even various open-source models) and build your prompts in a structured way.

LLMs and ChatModels in LangChain

LangChain primarily provides two interfaces for interacting with language models:

The LLM class: Primarily used for text completion tasks. You give it a piece of text, and it continues writing. For example, if you feed it "A support copilot should...", it will complete the rest of the sentence.
The ChatModel class: This is the workhorse of our intelligent support project. It is better suited for multi-turn conversations and role-playing. It receives a list of "Messages" (which typically include roles like System, Human, and AI) and returns a message. This is exactly what our customer support scenario requires.

PromptTemplate: Bringing Prompts to Life

Imagine if we had to manually write a complete prompt for every different user question, like this:

"You are a professional customer service bot. Please help the user resolve their order issue. The user's question is: 'Why hasn't my order #12345 shipped yet?'"

If the order number changes, or the type of question changes, we'd have to modify the entire string. That's incredibly inefficient!

PromptTemplate is LangChain's solution to this. It allows you to define a template with placeholders and then dynamically fill them in. This way, your prompt structure remains intact, and only the variable content changes. This is crucial for building reusable and maintainable prompts for intelligent support.

Core Workflow Diagram

The Mermaid diagram below clearly illustrates the flow of how a user request is "empowered" by a Prompt Template, ultimately reaching the LLM to get a response. This is essentially the "first heartbeat" of our intelligent support copilot.

graph TD
    A[User Query: "What happened to my order?"] --> B{LangChain PromptTemplate}
    B -- Inject Context --> C[Complete Prompt: "You are a professional... User query is 'What happened to my order?'"]
    C --> D[LangChain ChatModel (e.g., GPT-4)]
    D -- Generate Response --> E[LLM Response: "Please provide your order number, and I will check for you."]
    E --> F[Intelligent Support Copilot]
    F --> G[Return to User]

    subgraph Inside LangChain
        B
        C
        D
    end

Diagram Explanation:

User Query: The starting point for all input received by our support copilot.
LangChain PromptTemplate: Fills in the user query and other necessary information (e.g., system role settings, conversation history, retrieved knowledge base info) based on our predefined template.
Complete Prompt: A structured prompt containing all necessary instructions and context, ready to be sent to the LLM.
LangChain ChatModel: Invokes the underlying Large Language Model (e.g., OpenAI's GPT-4) via LangChain's interface.
LLM Response: The LLM generates the expected natural language response based on the prompt's instructions.
Intelligent Support Copilot: Our application receives the response from the LLM.
Return to User: Finally, the user sees the copilot's response on the interface.

This workflow might look simple, but every step is critical. The design of the PromptTemplate determines the upper limit of the LLM's performance, while the choice of ChatModel dictates its level of intelligence and cost.

💻 Hands-On Code Practice (Application in the Support Project)

Alright, enough theory—let's roll up our sleeves and get to work! We are now going to build the "brain" and "instruction system" for our intelligent support copilot.

We will implement this using Python and LangChain. Make sure you have installed the necessary libraries: pip install langchain langchain-openai python-dotenv.

Step 1: Environment Setup and Model Initialization

For security and convenience, we will store our API Key in a .env file.

Create a .env file with the following content (please replace with your actual API Key):

OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Next is the Python code:

import os
from dotenv import load_dotenv

# Import core LangChain components
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.messages import HumanMessage, SystemMessage

# Load environment variables
load_dotenv()

# Get OpenAI API Key
openai_api_key = os.getenv("OPENAI_API_KEY")
if not openai_api_key:
    raise ValueError("Please set OPENAI_API_KEY in the .env file")

print("------ Starting Intelligent Support Copilot... ------")

# 1. Initialize ChatModel
# We choose ChatOpenAI because it has the best support for multi-turn conversations and role-playing.
# The temperature parameter controls the randomness of the generated text. 0.0 means more deterministic, 1.0 means more creative.
# For customer support, we usually want accurate and stable responses, so we set a lower value.
chat_model = ChatOpenAI(
    api_key=openai_api_key,
    model="gpt-3.5-turbo", # You can also try "gpt-4o" or other models
    temperature=0.3 # Support scenario: we want stable and accurate answers
)
print(f"ChatModel initialized, using model: {chat_model.model_name}")

# --- Simulating Customer Support Scenarios ---

# Scenario 1: Direct ChatModel Invocation - Simple Question
print("\n--- Scenario 1: Direct ChatModel Invocation ---")
# ChatModel receives a list of messages, each with a role (System, Human, AI)
messages_direct = [
    HumanMessage(content="Why hasn't my order #12345 shipped yet?")
]
response_direct = chat_model.invoke(messages_direct)
print("User: Why hasn't my order #12345 shipped yet?")
print(f"Copilot (Direct Invocation): {response_direct.content}")
# Thought: Is this response professional enough? Without a clear role setting, the model might improvise too much.

# Scenario 2: Introducing SystemMessage - Setting the Role
print("\n--- Scenario 2: Introducing SystemMessage - Setting the Role ---")
# SystemMessage tells the model its role and behavioral guidelines
messages_with_system = [
    SystemMessage(content="You are a professional e-commerce support assistant. Answer user questions in a friendly, patient, and accurate tone. If there is not enough information to answer, politely ask the user for more details."),
    HumanMessage(content="When will I receive my order #67890?")
]
response_with_system = chat_model.invoke(messages_with_system)
print("User: When will I receive my order #67890?")
print(f"Copilot (Role Set): {response_with_system.content}")
# Thought: With a role set, doesn't the response sound more like a real support agent?

# Scenario 3: Using PromptTemplate - Dynamically Generating Prompts
print("\n--- Scenario 3: Using PromptTemplate - Dynamically Generating Prompts ---")

# Define a ChatPromptTemplate
# from_messages receives a list of message templates
# SystemMessagePromptTemplate is used to set the system role
# HumanMessagePromptTemplate receives user input and can contain variables
customer_service_prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template(
            "You are a professional e-commerce support assistant. Answer user questions in a friendly, patient, and accurate tone. "
            "If there is not enough information to answer, politely ask the user for more details. "
            "The current time is {current_time}." # We can inject dynamic system info
        ),
        HumanMessagePromptTemplate.from_template("The user's question is: {user_query}")
    ]
)

# Dynamically fill Prompt variables
from datetime import datetime
current_time = datetime.now().strftime("%Y-%m-%d %H:%M")

# Prepare user queries
user_question_1 = "Has my order #ABCDE shipped yet?"
user_question_2 = "The item I received is damaged, what should I do?"

# Use the .invoke() method, passing in a dictionary of variables
formatted_prompt_1 = customer_service_prompt.invoke({"current_time": current_time, "user_query": user_question_1})
response_template_1 = chat_model.invoke(formatted_prompt_1)

print(f"\nUser: {user_question_1}")
print(f"Copilot (PromptTemplate): {response_template_1.content}")

formatted_prompt_2 = customer_service_prompt.invoke({"current_time": current_time, "user_query": user_question_2})
response_template_2 = chat_model.invoke(formatted_prompt_2)

print(f"\nUser: {user_question_2}")
print(f"Copilot (PromptTemplate): {response_template_2.content}")

print("\n------ Intelligent Support Copilot Demo Ended ------")

Code Breakdown:

load_dotenv(): Loads environment variables from the .env file, ensuring the API Key is not exposed directly in the code. This is a best practice for production environments.
ChatOpenAI: Initializes a chat model based on the OpenAI API. The model parameter specifies the model version we are using (e.g., gpt-3.5-turbo or gpt-4o). The temperature parameter is crucial; it controls the randomness of the model's output. For support scenarios requiring rigor and accuracy, we typically set a lower value (like 0.3) to reduce "hallucinations" or uncertainty.
Direct invocation via chat_model.invoke(): The simplest way to call the model, passing a list containing a HumanMessage directly. You'll notice that while the model can answer, it might lack the professional tone of a "support agent."
Introducing SystemMessage: This is a key step in Prompt Engineering. By using SystemMessage(content="..."), we set clear roles and behavioral guidelines for the model. It's like giving a new employee an onboarding manual, telling them "who you are, how you should speak, and what you should do." You will noticeably feel the shift in response style.
ChatPromptTemplate along with SystemMessagePromptTemplate and HumanMessagePromptTemplate:
- ChatPromptTemplate.from_messages() allows us to build a structured prompt template by accepting a list of message templates.
- SystemMessagePromptTemplate.from_template() is used to define system-level instruction templates, which can include {} placeholders.
- HumanMessagePromptTemplate.from_template() is used to define user input message templates, which can also include {} placeholders.
- By using the .invoke() method, we pass in a dictionary where the keys correspond to the placeholders in the template, and the values are the dynamic data we want to inject. This way, we can easily reuse the same support prompt structure, only changing variables like the user query or current time.

Through these three scenarios, you should intuitively feel how moving from simple model invocations to introducing SystemMessage for role-setting, and finally using PromptTemplate for dynamic prompts, makes our intelligent support copilot increasingly professional, flexible, and powerful step by step.

📝 Pitfalls and How to Avoid Them

As a seasoned veteran, I've seen too many beginners stumble here. Don't worry, I'll point out the "minefields" ahead:

Prompt Ambiguity Leading the Model "Off-Track":
- Pitfall: You might think your prompt is clear enough, but an LLM isn't human; it interprets the exact words you provide strictly. For instance, if you just say "answer the user's question," it might use a very casual tone or even go off-topic.
- Solution: Be specific, specific, and more specific! Set a clear role ("You are a professional e-commerce support agent"), specify the tone ("friendly, patient, accurate"), define behaviors ("If information is insufficient, politely ask for more details"), and even provide output format requirements. Treat it like writing an operating manual for a newly hired intern—the more detailed, the better.
API Key Security Issues:
- Pitfall: Hardcoding your API Key directly into your code or uploading it to a public repository (like GitHub). This is equivalent to sticking your bank PIN on your forehead.
- Solution: Always use environment variables! As shown in this lesson, store OPENAI_API_KEY in a .env file and ensure .env is ignored by .gitignore. When deploying to a server, inject it via system environment variables.
Token Costs and Model Selection:
- Pitfall: Blindly chasing the newest and most powerful models (like GPT-4o), which consume a massive amount of tokens per call, causing costs to skyrocket.
- Solution: Choose the right model for the scenario. For simple Q&A or information extraction, gpt-3.5-turbo is usually sufficient, faster, and cheaper. Only consider more powerful models for highly complex tasks requiring deep reasoning or creativity. Keep an eye on LangChain's support for open-source models; there may be more economical choices in the future.
Context Window Limits (Foreshadowing for the Future):
- Pitfall: As conversation turns increase, the prompt can become longer and longer, eventually exceeding the LLM's context window limit. The model will either suffer from "amnesia" or throw an error.
- Solution: Although we haven't delved into multi-turn conversations yet, keep this in mind. In upcoming lessons, we will learn how to intelligently manage context using techniques like Memory Management and Retrieval-Augmented Generation (RAG), feeding only the most relevant information to the LLM.
The "Hallucination" Problem:
- Pitfall: LLMs might generate information that sounds highly plausible but is actually incorrect or fabricated (what we call "hallucinations"). This is fatal in a customer support scenario.
- Solution:
  - Lower the temperature: As shown in the code, setting the temperature around 0.3 effectively reduces the model's "creativity," making its output more deterministic.
  - Clear Instructions: Emphasize in the SystemMessage: "Answer only based on the provided information, do not guess."
  - Introduce a Knowledge Base: This is the core of our entire project! By combining the LLM with our own support knowledge base (RAG), we force it to fetch information from trusted sources, drastically reducing hallucinations. This is exactly the focus of our subsequent lessons.

Remember, these "pitfalls" aren't meant to scare you, but to help you think about and design your AI applications from a more professional perspective.

📝 Session Summary

Congratulations on taking your first step toward becoming a LangChain Master!

In this session, we deeply explored how LLMs act as the "brain" of intelligent support, and how Prompts serve as the "instructions" we use to command this brain. Through hands-on code, we learned how to initialize and invoke a ChatModel in LangChain, set roles via SystemMessage, and ultimately mastered the art of building flexible, reusable dynamic prompts using PromptTemplate.

You should now have a clear understanding of how to make an AI model speak and act according to your intentions. This foundation is crucial and will carry through all the more complex modules we tackle next.

In the next part, we will dive deeper and explore how to give our support copilot "memory," allowing it to remember previous conversation context to achieve more coherent and intelligent multi-turn dialogues. Stay tuned!