第 05 期 | 流式输出与可观测性：Callbacks 与打字机效果 (EN) — LangChain Masterclass: Zero to Production AI Applications

🎯 Learning Objectives for This Session

Hey there, future AI masters! Welcome to Part 01 of the LangChain Masterclass. I know you're all eager to roll up your sleeves and start coding, but hold your horses—Rome wasn't built in a day, and neither is an intelligent support copilot. In this session, we'll dive deep into the two foundational pillars of LangChain: LLMs (Large Language Models) and Prompt Templates. By the end of this session, you will:

Thoroughly understand how LangChain wraps LLMs, enabling you to easily harness various large models and say goodbye to tedious API calls.
Master the essence of Prompt Templates, learning how to "program" LLMs so they are obedient, efficient, and output exactly what you want.
Build a foundational intelligent customer support Q&A bot from scratch, experiencing the thrill of going from zero to one and laying a solid foundation for our "Intelligent Support Knowledge Base" project.
Identify and avoid common early-stage pitfalls, saving you time and frustration on your path to success.

Ready? Let's embark on this magical LangChain journey!

📖 Core Concepts Explained

LLMs: The "Brain" of LangChain

Imagine our "Intelligent Support Knowledge Base" project needs a "brain" that can understand user queries and provide professional answers. This brain is the Large Language Model (LLM). They are marvels of deep learning, trained on massive amounts of text, and possess astonishing capabilities in language understanding and generation.

However, the LLM market is incredibly diverse: OpenAI's GPT series, Google's Gemini, Anthropic's Claude, alongside a plethora of open-source models. Each model has its own API calling conventions and parameter settings, which can be an absolute nightmare for developers.

LangChain's LLM module acts like a "universal adapter." It provides a unified interface, allowing you to call and interact with LLMs from any vendor in almost exactly the same way. This drastically reduces the learning curve and the friction of switching models, letting you focus on business logic rather than low-level API integration.

In LangChain, LLMs are primarily divided into two categories:

LLM (Traditional Text Completion Models): These models typically take a string as input and generate a string as output. Think of early GPT-3. They act more like intelligent text auto-completers.
ChatModel (Chat Models): This is the current mainstream model type. They take a sequence of "messages" as input (e.g., user messages, AI assistant messages, system messages) and generate an "AI message" as output. This message-based interaction paradigm is much better suited for conversational scenarios and makes it easier to control the model's behavior and persona. Our intelligent support project will undoubtedly rely heavily on ChatModels.

Prompt Templates: The "Instruction Set" for LLMs

Now that we have a powerful LLM brain, we need a clear "instruction set" to tell it what to do and how to do it. This instruction set is the Prompt Template.

Have you ever asked an LLM a question, only to get a completely irrelevant answer or a response in the wrong tone? That happens when you don't provide a good "prompt." Prompt Engineering is the art of crafting well-designed prompts to guide the LLM to output exactly what we intend.

The core idea behind a Prompt Template is: combining fixed instructions with dynamic user inputs to generate a complete, clear, and context-rich prompt.

For instance, our intelligent support copilot shouldn't just simply answer user questions; it needs to:

Adopt a customer service persona: The tone must be professional and friendly.
Focus on knowledge base content: It cannot hallucinate facts or go off-topic.
Handle specific formats: For example, if a user asks about the "refund policy," it needs to know where to look and how to present the information clearly and concisely.

Prompt Templates allow us to pre-define these "fixed instructions" and then inject "dynamic content" at runtime, such as the user's query or relevant information retrieved from a knowledge base. This not only ensures prompt consistency but also significantly boosts development efficiency and the quality of the model's responses.

The Magical Combo of LLMs and Prompt Templates

When the powerful "brain" of an LLM meets the precise "instruction set" of a Prompt Template, a wonderful chemical reaction occurs. They form the most fundamental and core combination in LangChain.

The entire workflow can be summarized as follows:

The user asks a question.
We use a Prompt Template to format the user's raw question, the preset customer service persona, and (eventually) the knowledge base context into a clear, structured prompt.
This formatted prompt is sent to the LLM (or ChatModel).
The LLM reasons and generates text based on the prompt's content, returning an answer.
This answer becomes the reply from our intelligent support copilot.

Below is a Mermaid diagram that visually demonstrates this core workflow:

graph TD
    A[User Query] --> B{Prompt Template};
    B -- Inject Dynamic Variables --> C[Fully Formatted Prompt];
    C --> D(LLM / ChatModel);
    D -- Generate Response --> E[AI Assistant Reply];

    subgraph LangChain Core
        B;
        D;
    end

    style A fill:#f9f,stroke:#333,stroke-width:2px;
    style B fill:#bbf,stroke:#333,stroke-width:2px;
    style C fill:#fcc,stroke:#333,stroke-width:2px;
    style D fill:#afa,stroke:#333,stroke-width:2px;
    style E fill:#f9f,stroke:#333,stroke-width:2px;

This diagram clearly illustrates how a user's question is "processed" by the Prompt Template, "thought over" by the LLM, and ultimately transformed into a professional reply from the intelligent support bot. By understanding this foundational workflow, you've grasped the "soul" of LangChain.

💻 Hands-on Code Practice

It's time to turn theory into code! We will use OpenAI's gpt-3.5-turbo as our ChatModel, combined with a ChatPromptTemplate, to build a bare-bones intelligent customer support Q&A bot.

First, you need to install the LangChain and OpenAI libraries:

pip install langchain langchain-openai python-dotenv

Next, ensure your .env file contains your OPENAI_API_KEY:

OPENAI_API_KEY="sk-..."

Python Implementation

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

# 1. Load environment variables
load_dotenv()

# 2. Initialize the ChatModel
# Here we choose gpt-3.5-turbo, which offers great value for money
# The temperature parameter controls creativity: 0 is deterministic/conservative, 1 is more imaginative
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

# 3. Define the Prompt Template
# For a support bot, we need a system message to set its persona and behavior
# And a human message to receive the user's question
customer_service_template = ChatPromptTemplate.from_messages(
    [
        # SystemMessagePromptTemplate is used to set the AI's persona and global behavior
        SystemMessagePromptTemplate.from_template(
            "你是一个友好、专业且乐于助人的智能客服助手。你的目标是清晰、准确地回答用户的问题，并尽可能提供有用的信息。请保持礼貌和耐心。"
        ),
        # HumanMessagePromptTemplate is used to receive user input
        HumanMessagePromptTemplate.from_template(
            "用户的问题是：{user_question}"
        ),
    ]
)

# 4. Combine the Prompt Template and LLM
# LangChain's chaining is very elegant; here we use the .pipe() method to connect them
# This approach is the foundation of LangChain Expression Language (LCEL), which is highly powerful and flexible
# It means: first process the input with customer_service_template, then pass the result to the llm
customer_service_chain = customer_service_template | llm

# 5. Simulate user questions and get responses
print("--- 智能客服助手启动 ---")

# Scenario 1: Common question
user_query_1 = "请问你们的退款政策是什么？"
print(f"\n用户: {user_query_1}")
response_1 = customer_service_chain.invoke({"user_question": user_query_1})
print(f"客服助手: {response_1.content}")
# Expected output: A generic answer about the refund policy, in a professional tone.

# Scenario 2: Seeking help
user_query_2 = "我的订单号是 #12345，我该如何查询物流进度？"
print(f"\n用户: {user_query_2}")
response_2 = customer_service_chain.invoke({"user_question": user_query_2})
print(f"客服助手: {response_2.content}")
# Expected output: Guidance on how to check logistics, possibly prompting for more info.

# Scenario 3: Simple greeting
user_query_3 = "你好，请问你是谁？"
print(f"\n用户: {user_query_3}")
response_3 = customer_service_chain.invoke({"user_question": user_query_3})
print(f"客服助手: {response_3.content}")
# Expected output: Self-introduction as an intelligent support assistant, asking how it can help.

print("\n--- 智能客服助手已关闭 ---")

TypeScript Implementation

import 'dotenv/config'; // Load .env file
import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate } from '@langchain/core/prompts';
import { BaseMessage } from '@langchain/core/messages';

// 1. Ensure the OPENAI_API_KEY environment variable is set
if (!process.env.OPENAI_API_KEY) {
  console.error("OPENAI_API_KEY is not set in environment variables.");
  process.exit(1);
}

// 2. Initialize the ChatModel
// The modelName here corresponds to the model parameter in Python
const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo",
  temperature: 0.7, // Controls the model's creativity
});

// 3. Define the Prompt Template
const customerServiceTemplate = ChatPromptTemplate.fromMessages([
  // SystemMessagePromptTemplate is used to set the AI's persona and global behavior
  SystemMessagePromptTemplate.fromTemplate(
    "你是一个友好、专业且乐于助人的智能客服助手。你的目标是清晰、准确地回答用户的问题，并尽可能提供有用的信息。请保持礼貌和耐心。"
  ),
  // HumanMessagePromptTemplate is used to receive user input
  HumanMessagePromptTemplate.fromTemplate(
    "用户的问题是：{user_question}"
  ),
]);

// 4. Combine the Prompt Template and LLM
// In TypeScript, we can use the .pipe() method for chaining, similar to Python
// Alternatively, await prompt.formatMessages(input) then await llm.invoke(messages)
const customerServiceChain = customerServiceTemplate.pipe(llm);

// 5. Simulate user questions and get responses
async function runCustomerServiceDemo() {
  console.log("--- 智能客服助手启动 ---");

  // Scenario 1: Common question
  const userQuery1 = "请问你们的退款政策是什么？";
  console.log(`\n用户: ${userQuery1}`);
  const response1 = await customerServiceChain.invoke({ user_question: userQuery1 });
  console.log(`客服助手: ${response1.content}`);
  // Expected output: A generic answer about the refund policy, in a professional tone.

  // Scenario 2: Seeking help
  const userQuery2 = "我的订单号是 #12345，我该如何查询物流进度？";
  console.log(`\n用户: ${userQuery2}`);
  const response2 = await customerServiceChain.invoke({ user_question: userQuery2 });
  console.log(`客服助手: ${response2.content}`);
  // Expected output: Guidance on how to check logistics, possibly prompting for more info.

  // Scenario 3: Simple greeting
  const userQuery3 = "你好，请问你是谁？";
  console.log(`\n用户: ${userQuery3}`);
  const response3 = await customerServiceChain.invoke({ user_question: userQuery3 });
  console.log(`客服助手: ${response3.content}`);
  // Expected output: Self-introduction as an intelligent support assistant, asking how it can help.

  console.log("\n--- 智能客服助手已关闭 ---");
}

runCustomerServiceDemo();

With the code above, we've successfully created a foundational intelligent customer support Q&A bot. It can:

Adopt a customer service persona: Setting a friendly and professional tone via the SystemMessagePromptTemplate.
Receive user questions: Dynamically accepting user input via the HumanMessagePromptTemplate.
Generate responses using an LLM: Sending the formatted prompt to gpt-3.5-turbo to get an intelligent reply.

This is just the first step of a long journey! You've witnessed the immense power of LLMs and Prompt Templates—they are the cornerstones of building any complex AI application. In upcoming lessons, we will build upon this foundation, gradually integrating advanced features like knowledge base retrieval, memory, and tool usage to make our intelligent support copilot smarter and smarter.

Pitfalls and How to Avoid Them

As a veteran in this field, I've seen too many beginners stumble here. Don't worry, I'm here to help you clear the minefield in advance:

The Pitfall of Leaking API Keys:
- Symptom: Your OPENAI_API_KEY is hardcoded in your script or pushed directly to GitHub.
- Consequence: Your API Key gets stolen, your billing skyrockets, and it might even be used for malicious activities.
- How to Avoid: Never write sensitive information (like API Keys) directly into your code or commit them to version control. Always use environment variables (a .env file paired with the python-dotenv or dotenv library) to manage them. Make absolutely sure .env is listed in your .gitignore file!
The Pitfall of Poor Prompt Engineering (Garbage In, Garbage Out):
- Symptom: You throw the user's raw question directly at the LLM, resulting in an irrelevant answer or an inappropriate tone.
- Consequence: Poor user experience, underutilization of the model's capabilities, and potentially misleading information.
- How to Avoid:
  - Define the Persona: Clearly tell the LLM who it is via the SystemMessage (e.g., "You are a professional support assistant").
  - Provide Clear Instructions: Tell the LLM exactly what you want it to do (e.g., "Please answer based on the provided information; if the information is insufficient, politely inform the user").
  - Supply Context: In the future, we'll learn how to inject knowledge base content as context for the LLM.
  - Test and Iterate: Prompt Engineering is an iterative process. Keep experimenting with different phrasing and structures until you get satisfactory results.
The Pitfall of Token Limits:
- Symptom: Your input or output is too long, and the LLM throws a max_token_limit_exceeded error.
- Consequence: The application crashes, and user requests fail to process.
- How to Avoid:
  - Understand Model Limits: Different LLMs have different context window sizes (e.g., gpt-3.5-turbo is typically 4k or 16k tokens).
  - Streamline Prompts: Express instructions as concisely as possible to avoid redundancy.
  - Chunking: For extremely long texts, consider chunking the data or using summarization techniques.
  - Cost Considerations: More tokens mean higher costs. Keep an eye on your usage.
The Pitfall of Hallucinations:
- Symptom: The LLM confidently fabricates information, essentially making things up with a straight face.
- Consequence: For a customer support system, this is absolutely fatal! It misleads users and damages company credibility.
- How to Avoid:
  - Emphasize Truthfulness in System Messages: Explicitly instruct the LLM in the SystemMessage to "only answer based on the provided information; if unsure, do not guess."
  - Implement Retrieval-Augmented Generation (RAG): This is the focus of our upcoming sessions! By retrieving from an external knowledge base, we restrict the LLM to answering only within the bounds of given facts.
  - Fact-Checking: In critical scenarios, you may need human oversight or automated tools to double-check the LLM's answers.

These "pitfalls" are lessons learned the hard way. I hope you keep them in mind to save yourself a lot of trouble!

📝 Session Summary

Congratulations! In this session of the LangChain Masterclass, you have mastered the two core components of LangChain: LLMs (or ChatModels) and Prompt Templates. We learned how LangChain wraps different LLMs through a unified interface, and how Prompt Templates serve as the bridge for our communication with the LLM. More importantly, we built a foundational intelligent customer support Q&A bot from scratch, taking our first step toward building production-grade AI applications.

You should now have a solid grasp of:

The role of LLMs as the AI brain.
The importance of Prompt Templates as the AI instruction set.
How to combine the two to build a simple conversational system.
Key issues you might encounter in practice and their solutions.

With this foundation, you are now standing on the shoulders of giants. In the upcoming lessons, we will progressively unlock more of LangChain's powerful features, transforming our intelligent support copilot from a "junior trainee" into a true "full-stack master"!

Next time, we will dive deep into Output Parsers. We'll learn how to extract structured data from the LLM's free-flowing text responses, ensuring that the AI's output isn't just "spoken," but is also "understandable and actionable"! Stay tuned!