Issue 05 | Streaming Output and Observability: Callbacks and Typewriter Effect — LangChain Masterclass: Zero to Production AI Applications

🎯 Learning Objectives for this Issue

Hey, future AI masters! Welcome to the first issue of the "LangChain Full Stack Masterclass". I know you are all eager to roll up your sleeves and get to work, but don't rush, Rome wasn't built in a day, and a smart customer service assistant doesn't appear out of thin air. In this issue, we will delve into the two cornerstones of LangChain—LLM (Large Language Model) and Prompt Template. After completing this issue, you will:

Thoroughly understand how LangChain encapsulates LLMs, allowing you to easily master various large models and bid farewell to cumbersome API calls.
Grasp the essence of Prompt Templates, learning how to "program" LLMs to make them obedient, efficient, and output the results you want.
Build a basic smart customer service Q&A bot with your own hands, experiencing the thrill of going from zero to one, laying a solid foundation for our "Smart Customer Service Knowledge Base" project.
Identify and avoid common early-stage "pitfalls", taking fewer detours and heading straight to success.

Ready? Let's start the magical journey of LangChain together!

📖 Concept Explanation

LLM: The "Brain" of LangChain

Imagine that our "Smart Customer Service Knowledge Base" project needs a "brain" that can understand user questions and provide professional answers. This brain is the Large Language Model (LLM). They are miracles of deep learning, trained on massive amounts of text, and possess amazing language understanding and generation capabilities.

However, there is a wide variety of LLMs on the market: OpenAI's GPT series, Google's Gemini, Anthropic's Claude, and various open-source models. The API calling methods and parameter settings of each model may be different, which is a nightmare for developers.

LangChain's LLM module is like a "universal adapter". It provides a unified set of interfaces, allowing you to call and interact with LLMs from any vendor in almost the exact same way. This greatly reduces learning costs and the difficulty of switching models, allowing you to focus on business logic rather than underlying API adaptation.

In LangChain, LLMs are mainly divided into two categories:

LLM (Traditional text completion models): These models typically receive a string as input and generate a string as output. For example, the early GPT-3. They are more like smart text completers.
ChatModel (Chat models): This is the current mainstream model type. They receive a series of "messages" as input (e.g., user messages, AI assistant messages, system messages) and generate an "AI message" as output. This message-based interaction method is more suitable for conversational scenarios and makes it easier to control the model's behavior and role. Our smart customer service project will undoubtedly use ChatModel extensively.

Prompt Template: The "Instruction Set" of LLM

With a powerful LLM brain, we also need a clear "instruction set" to tell it what to do and how to do it. This instruction set is the Prompt Template.

Have you ever encountered a situation where you ask an LLM a question, and its answer is completely irrelevant, or the tone is completely wrong? This is because you didn't give it a good "prompt". Prompt Engineering is an art that guides the LLM to output according to our intentions through carefully designed prompts.

The core idea of a Prompt Template is: combining fixed instructions with dynamic user input to generate a complete, clear, and context-rich prompt.

For example, our smart customer service assistant cannot just simply answer user questions; it needs to:

Play the role of customer service: The tone must be professional and friendly.
Focus on knowledge base content: It cannot make things up or go off-topic.
Handle specific formats: For example, if a user asks about the "refund policy", it needs to know where to find it and present it in a concise and clear manner.

Prompt Templates allow us to pre-define these "fixed instructions" and then fill in "dynamic content" at runtime, such as the user's question and relevant information retrieved from the knowledge base. This not only ensures the consistency of prompts but also greatly improves development efficiency and the quality of the model's response.

The Magical Combination of LLM and Prompt Template

When the powerful "brain" of the LLM meets the precise "instruction set" of the Prompt Template, a wonderful chemical reaction occurs. They are the most basic and core combination in LangChain.

The entire process can be summarized as:

The user asks a question.
We use a Prompt Template to format the user's original question, the preset customer service role, and (to be added in the future) knowledge base context into a clear, structured prompt.
This formatted prompt is sent to the LLM (or ChatModel).
The LLM reasons and generates based on the content of the prompt, returning an answer.
This answer is the reply from our smart customer service assistant.

Below is a Mermaid diagram to visually demonstrate this core workflow:

graph TD
    A[User Question] --> B{Prompt Template};
    B -- Fill dynamic variables --> C[Complete Formatted Prompt];
    C --> D(LLM / ChatModel);
    D -- Generate Reply --> E[AI Assistant Reply];

    subgraph LangChain Core
        B;
        D;
    end

    style A fill:#f9f,stroke:#333,stroke-width:2px;
    style B fill:#bbf,stroke:#333,stroke-width:2px;
    style C fill:#fcc,stroke:#333,stroke-width:2px;
    style D fill:#afa,stroke:#333,stroke-width:2px;
    style E fill:#f9f,stroke:#333,stroke-width:2px;

This diagram clearly shows how the user's question is "processed" by the Prompt Template, then "thought over" by the LLM, and finally becomes the professional reply of the smart customer service. Understanding this basic workflow means you have grasped the "soul" of LangChain.

💻 Hands-on Coding Practice

It's time to turn theory into code! We will use OpenAI's gpt-3.5-turbo as our ChatModel and combine it with ChatPromptTemplate to build the simplest smart customer service Q&A bot.

First, you need to install the LangChain and OpenAI libraries:

pip install langchain langchain-openai python-dotenv

Then, ensure your .env file contains OPENAI_API_KEY:

OPENAI_API_KEY="sk-..."

Python Implementation

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

# 1. Load environment variables
load_dotenv()

# 2. Initialize ChatModel
# Here we choose gpt-3.5-turbo, which is a highly cost-effective model currently
# The temperature parameter controls the model's creativity, 0 means more deterministic and conservative, 1 means more imaginative
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

# 3. Define Prompt Template
# For smart customer service, we need a system message to set its role and behavior
# and a human message to receive the user's question
customer_service_template = ChatPromptTemplate.from_messages(
    [
        # SystemMessagePromptTemplate is used to set the AI's role and global behavior
        SystemMessagePromptTemplate.from_template(
            "You are a friendly, professional, and helpful smart customer service assistant. Your goal is to answer user questions clearly and accurately, and provide useful information as much as possible. Please remain polite and patient."
        ),
        # HumanMessagePromptTemplate is used to receive user input
        HumanMessagePromptTemplate.from_template(
            "The user's question is: {user_question}"
        ),
    ]
)

# 4. Combine Prompt Template and LLM
# LangChain's chain invocation is very elegant, here we use the .pipe() method to connect
# This approach is the foundation of LangChain Expression Language (LCEL), which is very powerful and flexible
# It means: first process the input with customer_service_template, then pass the result to llm
customer_service_chain = customer_service_template | llm

# 5. Simulate user questions and get replies
print("--- Smart Customer Service Assistant Started ---")

# Scenario 1: Common question
user_query_1 = "What is your refund policy?"
print(f"\nUser: {user_query_1}")
response_1 = customer_service_chain.invoke({"user_question": user_query_1})
print(f"Customer Service Assistant: {response_1.content}")
# Expected output: A general answer about the refund policy, with a professional tone.

# Scenario 2: Seeking help
user_query_2 = "My order number is #12345, how can I check the shipping progress?"
print(f"\nUser: {user_query_2}")
response_2 = customer_service_chain.invoke({"user_question": user_query_2})
print(f"Customer Service Assistant: {response_2.content}")
# Expected output: Guides the user on how to check shipping, and may prompt for more information.

# Scenario 3: Simple greeting
user_query_3 = "Hello, who are you?"
print(f"\nUser: {user_query_3}")
response_3 = customer_service_chain.invoke({"user_question": user_query_3})
print(f"Customer Service Assistant: {response_3.content}")
# Expected output: Introduces itself as a smart customer service assistant and asks what the user needs.

print("\n--- Smart Customer Service Assistant Closed ---")

TypeScript Implementation

import 'dotenv/config'; // Load .env file
import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate } from '@langchain/core/prompts';
import { BaseMessage } from '@langchain/core/messages';

// 1. Ensure OPENAI_API_KEY environment variable is set
if (!process.env.OPENAI_API_KEY) {
  console.error("OPENAI_API_KEY is not set in environment variables.");
  process.exit(1);
}

// 2. Initialize ChatModel
// The modelName here corresponds to the model parameter in Python
const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo",
  temperature: 0.7, // Controls the model's creativity
});

// 3. Define Prompt Template
const customerServiceTemplate = ChatPromptTemplate.fromMessages([
  // SystemMessagePromptTemplate is used to set the AI's role and global behavior
  SystemMessagePromptTemplate.fromTemplate(
    "You are a friendly, professional, and helpful smart customer service assistant. Your goal is to answer user questions clearly and accurately, and provide useful information as much as possible. Please remain polite and patient."
  ),
  // HumanMessagePromptTemplate is used to receive user input
  HumanMessagePromptTemplate.fromTemplate(
    "The user's question is: {user_question}"
  ),
]);

// 4. Combine Prompt Template and LLM
// In TypeScript, we can use the .pipe() method for chain invocation, similar to Python
// Or directly await prompt.formatMessages(input) and then await llm.invoke(messages)
const customerServiceChain = customerServiceTemplate.pipe(llm);

// 5. Simulate user questions and get replies
async function runCustomerServiceDemo() {
  console.log("--- Smart Customer Service Assistant Started ---");

  // Scenario 1: Common question
  const userQuery1 = "What is your refund policy?";
  console.log(`\nUser: ${userQuery1}`);
  const response1 = await customerServiceChain.invoke({ user_question: userQuery1 });
  console.log(`Customer Service Assistant: ${response1.content}`);
  // Expected output: A general answer about the refund policy, with a professional tone.

  // Scenario 2: Seeking help
  const userQuery2 = "My order number is #12345, how can I check the shipping progress?";
  console.log(`\nUser: ${userQuery2}`);
  const response2 = await customerServiceChain.invoke({ user_question: userQuery2 });
  console.log(`Customer Service Assistant: ${response2.content}`);
  // Expected output: Guides the user on how to check shipping, and may prompt for more information.

  // Scenario 3: Simple greeting
  const userQuery3 = "Hello, who are you?";
  console.log(`\nUser: ${userQuery3}`);
  const response3 = await customerServiceChain.invoke({ user_question: userQuery3 });
  console.log(`Customer Service Assistant: ${response3.content}`);
  // Expected output: Introduces itself as a smart customer service assistant and asks what the user needs.

  console.log("\n--- Smart Customer Service Assistant Closed ---");
}

runCustomerServiceDemo();

Through the code above, we have successfully created a basic smart customer service Q&A bot. It can:

Play the role of customer service: Set its friendly and professional tone through SystemMessagePromptTemplate.
Receive user questions: Dynamically receive user input through HumanMessagePromptTemplate.
Use LLM to generate replies: Send the formatted prompt to gpt-3.5-turbo to get a smart reply.

This is just the first step of a long journey! You have seen the immense power of LLM and Prompt Template; they are the cornerstones of building any complex AI application. In subsequent courses, we will build upon this foundation and gradually add advanced features such as knowledge base retrieval, memory, and tool usage, making our smart customer service assistant smarter and smarter.

Pitfalls and Avoidance Guide

As a veteran, I have seen too many beginners stumble here. Don't worry, I'm here to clear the mines for you in advance:

The Pitfall of API Key Leakage:
- Symptom: Your OPENAI_API_KEY is hardcoded in the code or directly uploaded to GitHub.
- Harm: Your API Key is stolen, your bill skyrockets, and it may even be used for illegal activities.
- Avoidance Guide: Never write sensitive information (like API Keys) directly into code or commit it to a version control system. Be sure to use environment variables (a .env file combined with the python-dotenv or dotenv library) to manage them. Your .gitignore file must include .env!
The Pitfall of Insufficient Prompt Engineering (Garbage In, Garbage Out):
- Symptom: You throw the user's question directly to the LLM, and the LLM's answer is completely off the mark, or the tone is inconsistent.
- Harm: Poor user experience, the model's capabilities are not fully utilized, and it may even generate misleading information.
- Avoidance Guide:
  - Clarify the role: Clearly tell the LLM its identity through SystemMessage (e.g., "You are a professional customer service assistant").
  - Clarify instructions: Tell the LLM what you want it to do (e.g., "Please answer the question based on the provided information. If the information is insufficient, please politely inform the user").
  - Provide context: In the future, we will learn how to provide knowledge base content as context to the LLM.
  - Test and iterate: Prompt Engineering is an iterative process. Keep trying different wordings and structures until you get satisfactory results.
The Pitfall of Token Limits:
- Symptom: Your input or output is too long, and the LLM throws a max_token_limit_exceeded error.
- Harm: The program crashes, and user requests cannot be processed.
- Avoidance Guide:
  - Understand model limits: Different LLMs have different context window sizes (e.g., gpt-3.5-turbo is usually 4k or 16k tokens).
  - Streamline Prompts: Try to express instructions in concise language and avoid redundancy.
  - Chunk processing: For extremely long texts, consider chunk processing or using summarization techniques.
  - Cost considerations: The more tokens, the higher the cost.
The Pitfall of Hallucination:
- Symptom: The LLM fabricates information and spouts nonsense with a straight face.
- Harm: For a customer service system, this is absolutely fatal! It will mislead users and damage the company's reputation.
- Avoidance Guide:
  - System message emphasizes authenticity: Explicitly require the LLM in the SystemMessage to "only answer based on the provided information. If you are unsure, please do not guess."
  - Introduce Retrieval-Augmented Generation (RAG): This is the focus of the next few issues! By retrieving from an external knowledge base, restrict the LLM to only answer within the given factual scope.
  - Fact-checking: In critical scenarios, manual or automated tools may be needed to double-check the LLM's answers.

These "pitfalls" are lessons paid in blood. I hope you keep them in mind and take fewer detours!

📝 Summary of this Issue

Congratulations! In this issue of the "LangChain Full Stack Masterclass", you have mastered the two core components of LangChain: LLM (or ChatModel) and Prompt Template. We learned how LangChain encapsulates different LLMs through a unified interface, and how Prompt Templates serve as a bridge for our communication with LLMs. More importantly, we built a basic smart customer service Q&A bot with our own hands, taking the first step towards building production-grade AI applications.

You should now have a clear understanding of:

The role of LLM as the AI brain.
The importance of Prompt Template as the AI instruction set.
How to combine the two to build a simple conversational system.
And the key problems and solutions you might encounter in practice.

With this foundation, you are already standing on the shoulders of giants. In the upcoming courses, we will gradually unlock more powerful features of LangChain, allowing our smart customer service assistant to grow from a "junior trainee" into a true "full-stack master"!

In the next issue, we will delve into Output Parsers, learning how to extract structured data from the LLM's free-flowing text replies, making the AI's output not just "spoken", but "understandable and actionable"! Stay tuned!