第 26 期 | 本地模型替身：结合 Ollama 与 Llama3 (EN) — LangChain Masterclass: Zero to Production AI Applications

Subtitle: From LangChain Architecture to Your First Intelligent Support Q&A Bot

Welcome, everyone, to the first session of the LangChain Full-Stack Masterclass: Zero to Production AI Applications! I'm your instructor, a "veteran" who has been navigating the AI field for a decade, deeply passionate about both technology and education. Today, we embark on a journey to unveil the mystery of LangChain—the "Swiss Army Knife of AI application development"—and build the very first "brain" for our intelligent support copilot!

🎯 Learning Objectives for this Session

By the end of this session, you will be able to:

Understand LangChain's core value and design philosophy: Grasp why LangChain has become the de facto standard for building LLM applications and what pain points it solves.
Master LangChain's foundational architecture and key modules: Gain a clear understanding of LangChain's components and the roles they play.
Build a simple intelligent support Q&A bot from scratch: Use LangChain to quickly implement an LLM-based Q&A feature from the ground up.
Grasp the initial application of LangChain in a customer support project: Understand how today's knowledge lays the foundation for our "Intelligent Support Knowledge Base" project.

📖 Concept Breakdown

LangChain: The Swiss Army Knife of AI App Development (No Exaggeration!)

You might ask: Since we already have powerful LLMs like OpenAI and Anthropic, why do we need LangChain? It's like having a top-tier engine—you still need a complete chassis, transmission, brakes, and steering wheel to build a real car.

While LLMs are powerful, they are essentially "black-box" text generators. In real-world AI applications, we need much more than simple Q&A. Imagine our "Intelligent Support Knowledge Base" project:

It needs to retrieve information from massive amounts of documents.
It needs to understand complex user intents.
It needs to remember the user's conversation history.
It needs to call external tools (like querying order status or a password reset API).
It also needs to logically chain these complex steps together.

If you had to build this logic from scratch every time, it would be a nightmare of reinventing the wheel! LangChain emerged to solve this. It provides a standardized, modular, and composable set of tools that drastically simplifies the LLM application development process. Like a Swiss Army Knife, it integrates various practical tools (modules), allowing you to efficiently handle complex tasks.

LangChain's Core Philosophy: Modularity and Composability

LangChain's design philosophy is crystal clear: abstract the capabilities needed to build LLM apps into independent modules, and then flexibly combine them using a "Chain" mechanism to accomplish complex tasks.

Its primary modules include:

LLMs (Language Models): The "heart" of LangChain, responsible for interacting with various large models (OpenAI, Google Gemini, Anthropic Claude, etc.). It provides a unified interface, letting you easily switch between different models.
Prompt Templates: The "cerebral cortex" of LangChain, responsible for generating structured, task-specific prompts. Good prompts are the key to successful LLM apps, and LangChain makes prompt management and reuse incredibly simple.
Chains: The "nervous system" of LangChain, responsible for combining multiple LLM calls or other tools in a sequential or logical manner. It defines the flow of information and processing logic.
Retrievers: The "memory bank" of LangChain, responsible for retrieving relevant information from external knowledge sources (like vector databases or documents) to augment the LLM's knowledge.
Agents: The "decision-makers" of LangChain, empowering LLMs with the ability to use tools. Agents can autonomously decide which steps to take and which tools to call based on user input to solve a problem.
Memory: The "short-term memory" of LangChain, allowing the LLM to remember conversation history, thereby achieving coherence in multi-turn dialogues.
Document Loaders: The "librarians" of LangChain, responsible for loading data from various formats (PDF, CSV, HTML, etc.).
Vector Stores: The "long-term memory" of LangChain, used to store and retrieve vectorized text data, often used in conjunction with retrievers.

In this session, we will focus on the most core modules: LLMs, Prompt Templates, and Chains, to build the first simple Q&A feature for our intelligent support bot.

The Prototype of a Support Bot's "Brain": Q&A Workflow Analysis

Imagine our intelligent support copilot—what is its most fundamental capability? Isn't it simply providing a reasonable answer when a user asks a question?

In LangChain, this process can be simplified into the following workflow:

User inputs a question: e.g., "How do I reset my password?"
Prompt Template processing: Embed the user's question into a preset template to give the LLM clear instructions, such as: "You are a professional support assistant. Please answer the user's question in a concise and friendly manner: [User Question]".
Call the Large Language Model (LLM): Send the processed prompt to the large model.
Model generates an answer: The LLM generates a response based on the prompt and its internal knowledge.
Output the answer: Present the LLM's response to the user.

This workflow is the prototype of our intelligent support bot's "brain." Although simple, it is the foundation of all complex features.

graph TD
    A[User] --> B{Input Question};
    B --> C[PromptTemplate: Construct Instruction];
    C --> D[LLM: Large Language Model];
    D --> E[Output Parser Optional: Format Response];
    E --> F[Intelligent Support Copilot];
    F --> G[Reply to User];

    subgraph LangChain Core Workflow
        C --- D --- E
    end

    style A fill:#f9f,stroke:#333,stroke-width:2px;
    style B fill:#bbf,stroke:#333,stroke-width:2px;
    style C fill:#ccf,stroke:#333,stroke-width:2px;
    style D fill:#ddf,stroke:#333,stroke-width:2px;
    style E fill:#eef,stroke:#333,stroke-width:2px;
    style F fill:#fcf,stroke:#333,stroke-width:2px;
    style G fill:#9f9,stroke:#333,stroke-width:2px;

The diagram above clearly illustrates the simplified workflow from a user's question to the support bot's reply. LangChain plays the core role of connecting each module and chaining the logic together.

💻 Hands-On Coding (Application in the Support Project)

Alright, the concepts are clear. It's time to roll up our sleeves and get to work! We will use LangChain to implement the simple Q&A workflow described above, enabling our intelligent support copilot to answer general questions.

Prerequisites

First, we need to install the LangChain library and OpenAI integration libraries. We are choosing OpenAI as our LLM provider for this session because it is currently the industry benchmark and easy to integrate.

# Python environment
pip install langchain langchain-openai python-dotenv

# TypeScript / JavaScript environment
npm install langchain @langchain/openai dotenv

Then, to securely manage the API Key, we typically use a .env file. Create a file named .env in your project's root directory and fill in your OpenAI API Key:

OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Python Implementation

Let's use Python to build our first intelligent support Q&A bot.

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableSequence

# 1. Load environment variables
# This step reads the OPENAI_API_KEY from the .env file
load_dotenv()

# Check if the API Key is loaded
if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found in environment variables. Please set it in a .env file.")

print("--- Starting Intelligent Support Copilot ---")

# 2. Initialize the Large Language Model (LLM)
# We use ChatOpenAI, which wraps OpenAI's chat model interface
# The temperature parameter controls the model's creativity: 0 is more deterministic, 1 is more creative
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)
print(f"✅ Connected to LLM: {llm.model_name}")

# 3. Define the Prompt Template
# These are our instructions to the LLM, telling it how to roleplay and answer questions
# {question} is a placeholder where the user's question will be injected
prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a friendly intelligent support assistant. Your task is to answer user questions concisely and accurately."),
        ("human", "{question}"),
    ]
)
print("✅ Support prompt template defined.")

# 4. Build the Chain
# LangChain Expression Language (LCEL) is the modern way to build chains—concise and powerful
# This chain represents: User Question -> Prompt Template -> LLM -> String Parser
# StrOutputParser converts the LLM's output (usually an AIMessage object) into a plain string
qa_chain = RunnableSequence(prompt_template | llm | StrOutputParser())
print("✅ Q&A chain built.")

# 5. Simulate Support Q&A
print("\n--- Starting Q&A Simulation ---")

# Scenario 1: User asks how to reset a password (Common support question)
user_question_1 = "I forgot my account password, how can I reset it?"
print(f"\n🙋‍♂️ User: {user_question_1}")
# Invoke the chain to get the answer
response_1 = qa_chain.invoke({"question": user_question_1})
print(f"🤖 Support Copilot: {response_1}")

# Scenario 2: User asks about product features (Common support question)
user_question_2 = "What file formats does your product support for uploading?"
print(f"\n🙋‍♂️ User: {user_question_2}")
response_2 = qa_chain.invoke({"question": user_question_2})
print(f"🤖 Support Copilot: {response_2}")

# Scenario 3: User asks a general knowledge question (LLM's general capability)
user_question_3 = "Can you briefly explain what artificial intelligence is?"
print(f"\n🙋‍♂️ User: {user_question_3}")
response_3 = qa_chain.invoke({"question": user_question_3})
print(f"🤖 Support Copilot: {response_3}")

print("\n--- Q&A Simulation Ended ---")
print("🎉 Congratulations, your first intelligent support Q&A bot is running successfully!")

Code Breakdown:

load_dotenv(): Loads environment variables from the .env file, ensuring your API Key isn't hardcoded in the codebase. This is a fundamental requirement for production-grade applications.
ChatOpenAI: Initializes a chat model instance. We specified the gpt-3.5-turbo model. The temperature parameter is crucial; it controls the "randomness" or "creativity" of the model's responses. For customer support scenarios, we usually want more stable and deterministic replies, so 0.7 is a good balanced choice.
ChatPromptTemplate.from_messages(): Defines the "script" for our communication with the LLM.
- ("system", ...): The system message, used to set the LLM's persona and behavioral guidelines. Here, we ask it to play the role of a "friendly intelligent support assistant." This is a key part of Prompt Engineering.
- ("human", "{question}"): The user message. {question} is a placeholder that LangChain will automatically replace with the actual question we pass in.
RunnableSequence (LCEL): This is the recommended chaining method in LangChain 0.1.0+, which is highly concise and intuitive.
- prompt_template | llm | StrOutputParser(): This represents a pipeline operation. The user input is first processed by the prompt_template into a complete prompt, which is then sent to the llm. The result returned by the LLM is finally converted into a plain text string by the StrOutputParser.
qa_chain.invoke(): Calls our constructed chain, passing in a dictionary where the value of question is the user's actual question.

Through this code, our intelligent support copilot gains its initial Q&A capability. It can understand user questions and provide corresponding replies based on the OpenAI model's general knowledge and the "support assistant" persona we set.

TypeScript Implementation

For frontend or Node.js developers, we also provide a TypeScript implementation.

import * as dotenv from 'dotenv';
import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { RunnableSequence } from '@langchain/core/runnables';

// 1. Load environment variables
// This step reads the OPENAI_API_KEY from the .env file
dotenv.config();

// Check if the API Key is loaded
if (!process.env.OPENAI_API_KEY) {
    throw new Error("OPENAI_API_KEY not found in environment variables. Please set it in a .env file.");
}

console.log("--- Starting Intelligent Support Copilot ---");

// 2. Initialize the Large Language Model (LLM)
// We use ChatOpenAI, which wraps OpenAI's chat model interface
// The temperature parameter controls the model's creativity: 0 is more deterministic, 1 is more creative
const llm = new ChatOpenAI({
    model: "gpt-3.5-turbo",
    temperature: 0.7,
});
console.log(`✅ Connected to LLM: ${llm.modelName}`);

// 3. Define the Prompt Template
// These are our instructions to the LLM, telling it how to roleplay and answer questions
// {question} is a placeholder where the user's question will be injected
const promptTemplate = ChatPromptTemplate.fromMessages([
    ["system", "You are a friendly intelligent support assistant. Your task is to answer user questions concisely and accurately."],
    ["human", "{question}"],
]);
console.log("✅ Support prompt template defined.");

// 4. Build the Chain
// LangChain Expression Language (LCEL) is the modern way to build chains—concise and powerful
// This chain represents: User Question -> Prompt Template -> LLM -> String Parser
// StringOutputParser converts the LLM's output (usually an AIMessage object) into a plain string
const qaChain = RunnableSequence.from([
    promptTemplate,
    llm,
    new StringOutputParser(),
]);
console.log("✅ Q&A chain built.");

// 5. Simulate Support Q&A
async function runDemo() {
    console.log("\n--- Starting Q&A Simulation ---");

    // Scenario 1: User asks how to reset a password (Common support question)
    const userQuestion1 = "I forgot my account password, how can I reset it?";
    console.log(`\n🙋‍♂️ User: ${userQuestion1}`);
    // Invoke the chain to get the answer
    const response1 = await qaChain.invoke({ question: userQuestion1 });
    console.log(`🤖 Support Copilot: ${response1}`);

    // Scenario 2: User asks about product features (Common support question)
    const userQuestion2 = "What file formats does your product support for uploading?";
    console.log(`\n🙋‍♂️ User: ${userQuestion2}`);
    const response2 = await qaChain.invoke({ question: userQuestion2 });
    console.log(`🤖 Support Copilot: ${response2}`);

    // Scenario 3: User asks a general knowledge question (LLM's general capability)
    const userQuestion3 = "Can you briefly explain what artificial intelligence is?";
    console.log(`\n🙋‍♂️ User: ${userQuestion3}`);
    const response3 = await qaChain.invoke({ question: userQuestion3 });
    console.log(`🤖 Support Copilot: ${response3}`);

    console.log("\n--- Q&A Simulation Ended ---");
    console.log("🎉 Congratulations, your first intelligent support Q&A bot is running successfully!");
}

runDemo();

TypeScript Code Breakdown:

The logic is identical to the Python version, with only syntactic differences:

dotenv.config(): Loads the .env file.
new ChatOpenAI(): Initializes the chat model.
ChatPromptTemplate.fromMessages(): Defines the prompt template, also using an array of [string, string] to represent messages.
RunnableSequence.from([...]): Builds the chain, similarly defining the pipeline sequentially via an array.
await qaChain.invoke({ question: userQuestion }): Invoking the chain is asynchronous, so we must use await.

At this point, our intelligent support copilot has acquired basic Q&A capabilities. It can receive user questions and reply using the large model's general knowledge and our predefined persona. While it cannot yet access our company's internal knowledge base, this is a solid first step!

Pitfalls and Best Practices

As a senior developer, I've seen too many pitfalls and summarized some best practices. Today, I'll share a few common mistakes and considerations for beginners:

API Key Security:
- Pitfall: Hardcoding the OPENAI_API_KEY directly in the code or uploading it to a public repository (like GitHub). This is like taping your bank PIN to your forehead!
- Best Practice: Always use environment variables (like a .env file paired with python-dotenv or dotenv) to manage sensitive information. When deploying to production, ensure these variables are injected securely (e.g., via Kubernetes Secrets, AWS Secrets Manager, Vault, etc.).
The Importance of Prompt Engineering:
- Pitfall: Assuming LLMs are omnipotent and will give perfect answers to vague questions. Or providing unclear prompts, causing the model to output nonsense (Hallucination).
- Best Practice: "Garbage in, garbage out" is especially true in the LLM world. The system message we used today is the most basic form of prompt engineering. Remember, clear, specific instructions with context are often more effective than hours of parameter tuning. Experiment with different phrasings to constrain the model's response format, length, tone, etc.
LLM "Hallucinations":
- Pitfall: Over-trusting the LLM's answers, believing everything it says is absolute truth.
- Best Practice: Current LLMs are not rigorous knowledge bases; they can generate information that sounds plausible but is actually incorrect or fabricated. In our support project, this means it might "invent" company policies or product features. This is the biggest limitation of this simple Q&A model. In the future, we will solve this using technologies like RAG (Retrieval-Augmented Generation), allowing the LLM to answer based on real knowledge we provide.
Cost Management and Model Selection:
- Pitfall: Blindly using the newest and most powerful model (like gpt-4) without thinking, leading to an exploding bill.
- Best Practice: Different models have different pricing and performance. For simple Q&A or testing, gpt-3.5-turbo is usually the most cost-effective choice. Only consider more expensive models when advanced reasoning capabilities are truly needed. LangChain's unified interface makes switching models a breeze—make good use of this feature.
Asynchronous Operations (Async) and Concurrency:
- Pitfall: Synchronously calling the LLM when handling a large volume of requests, causing performance bottlenecks.
- Best Practice: Many of LangChain's methods offer asynchronous versions (like ainvoke). In production environments, especially in web services, be sure to leverage asynchronous programming to improve concurrent processing and response speeds. We used synchronous calls today for simplicity, but keep this in mind.

📝 Session Summary

Congratulations, everyone! Through today's session, we not only deeply understood LangChain's core value as the "Swiss Army Knife" of AI app development and its modular design philosophy, but we also built the first milestone of our "Intelligent Support Knowledge Base" project—a simple LLM-based Q&A bot.

We learned how to:

Initialize and connect to an OpenAI large model.
Use ChatPromptTemplate for basic prompt engineering to give the model a persona.
Build a concise and efficient Q&A chain using LangChain Expression Language (LCEL).
Enable the intelligent support copilot to respond to general user questions in a practical project context.

Although this copilot is still in its "infancy"—its knowledge is limited to the LLM's general training data and it cannot access our company's internal documents—this is exactly the charm of LangChain. It has laid a solid foundation for our future feature expansions.

In upcoming lessons, we will gradually unlock LangChain's more powerful capabilities, such as how to make our support bot "read" company product manuals and FAQ documents, how to remember multi-turn user conversations, and even how to call external APIs to execute specific actions.

Next Time: We will dive deep into LangChain's Document Loaders and Text Splitters, learning how to transform massive amounts of unstructured documents into knowledge fragments that the LLM can understand, building an exclusive "memory bank" for our intelligent support copilot!

Keep it up, AI Architects, our journey has just begun!