lesson-19

20 分钟阅读 | 更新于：2026-05-07

各位 AI 架构师们，欢迎回到我们的《LangGraph 多智能体专家课》。我是你们的老朋友。

在前面的 18 期里，我们的「AI 万能内容创作机构 (AI Content Agency)」已经初具规模。Planner 运筹帷幄，Researcher 疯狂搜刮全网数据，Writer 奋笔疾书，Editor 铁面无私。看着满屏跑通的日志，你是不是觉得已经可以开香槟庆祝，准备接单赚钱了？

且慢。昨天有个学员半夜在群里发了一段长达 10 万 Token 的报错日志，崩溃地问我：“老师，为什么我的 Writer 写着写着，突然在文章里给我输出了一段 HTTP 404 Not Found 和 BeautifulSoup 解析失败 的代码？”

我一看他的架构，拍了拍大腿：“老弟，你让 Writer 看到 Researcher 厨房里的垃圾桶了！”

在默认的 LangGraph 教程中，大家习惯于把一个全局的 State（通常包含一个 messages 列表）从头传到尾。这就好比在一个公司里，全员共享一个微信群：Researcher 查资料时遇到网络超时、解析乱码、自我纠错的中间过程，全都发在这个群里。等 Writer 想要根据资料写文章时，他必须在几百条垃圾聊天记录里翻找有用的信息。这不仅会导致 Token 成本爆炸，更会引发严重的 LLM 幻觉（Hallucination）。

今天，我们就来解决这个多智能体架构中的核心痛点：状态隔离（State Isolation）。我们要为 Agency 引入“私有状态墙”的概念，不要让 Writer 看到 Researcher 中间的垃圾报错日志，重构 State，隔绝脏数据！

🎯 本期学习目标

通过本期实战，你将掌握以下高阶技能：

打破全局 State 迷信：理解为什么“共享一切”是复杂 Multi-Agent 系统的灾难。
构建 Subgraph（子图）私有状态墙：利用 LangGraph 的子图特性，为 Researcher 建立一个“黑盒工作间”。
实现状态穿透与映射：掌握如何精准控制，只让 Researcher 提炼后的 clean_summary 穿透回全局状态，供 Writer 使用。
降低 Token 消耗与提升稳定性：用架构的手段，物理隔绝脏数据，提升下游 Agent 的输出质量。

📖 原理解析

在软件工程中，我们讲究“高内聚，低耦合”和“最小权限原则”。Agent 架构同样适用。

在传统的单图结构中，所有的 Node 都挂载在同一个 StateGraph 上，共享同一个 TypedDict。如果 Researcher 需要进行 3 次网页搜索、2 次反爬重试，这些中间状态（比如 raw_html, search_errors）都会堆积在全局状态中。

我们的破局之道是：引入 Subgraph（子图）。

我们将 Researcher 升级为一个独立的子图。它拥有自己的 ResearcherState。在这个子图里，Researcher 可以尽情地犯错、重试、处理乱码。等它把脏活累活干完，生成了一份干净的《研究简报》后，它只把这份简报通过“状态通道”返回给父图（全局 Agency）。

来看下面这张架构图，搞懂了它，你就搞懂了今天的心法：

graph TD
    subgraph Global_Agency_State [全局状态区 Global State]
        direction TB
        G_Topic[任务主题: topic]
        G_Summary[研究简报: research_summary]
        G_Draft[初稿: draft]
    end

    Planner(Planner 节点
分配任务) --> Researcher_Subgraph

    subgraph Researcher_Subgraph [Researcher 私有工作间 Subgraph]
        direction TB
        R_State[(私有状态 ResearcherState)]
        R_State -.包含.-> R_Raw[原始网页数据 raw_html]
        R_State -.包含.-> R_Err[重试报错 error_logs]
        R_State -.包含.-> R_Steps[中间思考过程 scratchpad]
        
        Search(搜索节点) --> Scrape(爬虫节点)
        Scrape --"出错重试"--> Search
        Scrape --> Summarize(提炼节点)
    end

    Researcher_Subgraph --"信息穿透：只返回干净简报"--> G_Summary
    G_Summary --> Writer(Writer 节点
根据简报写作)

    style Global_Agency_State fill:#f9f9f9,stroke:#333,stroke-width:2px
    style Researcher_Subgraph fill:#e6f7ff,stroke:#1890ff,stroke-width:2px,stroke-dasharray: 5 5
    style G_Summary fill:#d9f7be,stroke:#52c41a
    style R_Err fill:#ffccc7,stroke:#f5222d

图解说明：

虚线框代表 Researcher 的私有工作间（Subgraph）。里面的 raw_html 和 error_logs 对于外部是完全不可见的（黑盒）。
绿色节点 G_Summary 是唯一穿透私有墙的信息。
Writer 节点在工作时，它的上下文里绝对不会出现 Researcher 的 error_logs，从而保证了创作的纯粹性。

💻 实战代码演练

废话不多说，Show me the code。我们将使用 Python 和最新的 LangGraph API 来实现这个重构。请仔细看代码中的双语注释，这是实战的精华。

步骤 1：定义两套 State（全局与私有）

首先，我们必须在代码层面把“公共广场”和“私有包厢”区分开来。

from typing import TypedDict, List, Annotated
import operator
from langgraph.graph import StateGraph, START, END

# ==========================================
# 1. 定义全局状态 (Global Agency State)
# 这是 Planner, Writer, Editor 共享的干净上下文
# ==========================================
class AgencyState(TypedDict):
    topic: str
    # 核心：这里只存提炼后的简报，不存中间废料
    # Core: Only store the refined summary here, no intermediate garbage
    research_summary: str 
    draft: str
    final_article: str

# ==========================================
# 2. 定义 Researcher 的私有状态 (Private State)
# 它的任务是把 topic 变成 research_summary
# ==========================================
class ResearcherState(TypedDict):
    # 从全局继承的输入
    topic: str
    
    # --- 以下为脏数据/私有数据 (Dirty/Private Data) ---
    # 使用 Annotated 和 operator.add 来累加中间日志，但绝不泄露给全局
    search_queries: Annotated[List[str], operator.add]
    raw_html_snippets: Annotated[List[str], operator.add]
    error_logs: Annotated[List[str], operator.add]
    retry_count: int
    
    # 最终产出
    research_summary: str

步骤 2：构建 Researcher 子图 (Subgraph)

接下来，我们把 Researcher 包装成一个独立的 Graph。这个 Graph 内部怎么折腾都行，只要最后吐出 research_summary。

# 模拟：带有报错和脏数据的搜索与爬虫节点
def search_and_scrape(state: ResearcherState):
    print("  [Researcher] 正在全网搜刮脏数据...")
    topic = state["topic"]
    
    # 模拟产生了大量垃圾日志和中间废料
    # Simulating the generation of garbage logs and intermediate waste
    mock_html = f"<html><body>Lots of messy data about {topic}...</body></html>"
    mock_error = "HTTP 404: Image not found during scraping."
    
    return {
        "search_queries": [f"Deep dive {topic}"],
        "raw_html_snippets": [mock_html],
        "error_logs": [mock_error],
        "retry_count": state.get("retry_count", 0) + 1
    }

# 模拟：从脏数据中提炼干净的简报
def distill_information(state: ResearcherState):
    print("  [Researcher] 正在过滤脏数据，提炼核心简报...")
    # 只有在这个节点，大模型才会去阅读那堆乱七八糟的 raw_html_snippets
    # Only here does the LLM read the messy raw_html_snippets
    
    dirty_data_size = len(str(state.get("raw_html_snippets", [])))
    error_count = len(state.get("error_logs", []))
    
    # 模拟大模型提炼过程
    clean_summary = f"【干净的研究简报】：关于 {state['topic']} 的核心要点是 XYZ。已过滤 {dirty_data_size} 字节的脏数据和 {error_count} 条报错记录。"
    
    return {"research_summary": clean_summary}

# 组装 Researcher 子图
# Assemble the Researcher Subgraph
researcher_builder = StateGraph(ResearcherState)
researcher_builder.add_node("search_and_scrape", search_and_scrape)
researcher_builder.add_node("distill_information", distill_information)

researcher_builder.add_edge(START, "search_and_scrape")
researcher_builder.add_edge("search_and_scrape", "distill_information")
researcher_builder.add_edge("distill_information", END)

# 编译子图
researcher_graph = researcher_builder.compile()

步骤 3：构建全局图并嵌入子图

现在，见证奇迹的时刻到了。我们在全局的 Agency Graph 中，将上面编译好的 researcher_graph 作为一个普通的 Node 挂载上去。

关键知识点：当 LangGraph 执行子图时，它会将父图的 State 传入子图的 State（根据键名匹配，比如 topic 会传进去）。当子图执行完毕（到达 END）时，它会将子图最后输出的 State 返回给父图，同样根据键名匹配进行覆盖或追加。

# 模拟：Planner 节点
def planner_node(state: AgencyState):
    print(f"\n[Planner] 收到任务主题: {state['topic']}")
    return {"topic": state["topic"]}

# 模拟：Writer 节点
def writer_node(state: AgencyState):
    # 重点观察：Writer 能不能看到 error_logs？
    # Key observation: Can the Writer see the error_logs?
    print("\n[Writer] 准备开始写作...")
    
    # 故意尝试获取脏数据，看看能不能拿到
    if "error_logs" in state: # type: ignore
        print("  [Writer 崩溃] 哎呀！我看到了报错日志，我的 prompt 被污染了！")
    else:
        print("  [Writer 狂喜] 太棒了！我的上下文非常干净，没有任何垃圾数据！")
        
    summary = state.get("research_summary", "")
    print(f"  [Writer] 接收到的参考资料: {summary}")
    
    draft = f"这是一篇基于 {summary} 撰写的绝妙文章初稿。"
    return {"draft": draft}

# 组装全局 Agency 图
# Assemble the Global Agency Graph
agency_builder = StateGraph(AgencyState)

agency_builder.add_node("planner", planner_node)
# 直接将子图作为节点添加！(LangGraph 的神仙特性)
# Add the compiled subgraph directly as a node!
agency_builder.add_node("researcher_team", researcher_graph) 
agency_builder.add_node("writer", writer_node)

agency_builder.add_edge(START, "planner")
agency_builder.add_edge("planner", "researcher_team")
agency_builder.add_edge("researcher_team", "writer")
agency_builder.add_edge("writer", END)

agency_graph = agency_builder.compile()

步骤 4：运行与验证验证

让我们跑起来，看看状态隔离的威力。

if __name__ == "__main__":
    print("=== 🚀 AI Content Agency 启动 (Episode 19 状态隔离版) ===\n")
    
    initial_state = {"topic": "2024年 AI Agent 发展趋势"}
    
    # 运行全局图
    final_state = agency_graph.invoke(initial_state)
    
    print("\n=== 🏁 运行结束，检查全局最终状态 ===")
    for key, value in final_state.items():
        print(f"-> {key}: {value}")

控制台输出结果：

=== 🚀 AI Content Agency 启动 (Episode 19 状态隔离版) ===

[Planner] 收到任务主题: 2024年 AI Agent 发展趋势
  [Researcher] 正在全网搜刮脏数据...
  [Researcher] 正在过滤脏数据，提炼核心简报...

[Writer] 准备开始写作...
  [Writer 狂喜] 太棒了！我的上下文非常干净，没有任何垃圾数据！
  [Writer] 接收到的参考资料: 【干净的研究简报】：关于 2024年 AI Agent 发展趋势 的核心要点是 XYZ。已过滤 61 字节的脏数据和 1 条报错记录。

=== 🏁 运行结束，检查全局最终状态 ===
-> topic: 2024年 AI Agent 发展趋势
-> research_summary: 【干净的研究简报】：关于 2024年 AI Agent 发展趋势 的核心要点是 XYZ。已过滤 61 字节的脏数据和 1 条报错记录。
-> draft: 这是一篇基于 【干净的研究简报】：关于 2024年 AI Agent 发展趋势 的核心要点是 XYZ。已过滤 61 字节的脏数据和 1 条报错记录。 撰写的绝妙文章初稿。

看到了吗？全局的 final_state 中根本不存在 raw_html_snippets 和 error_logs 这两个 Key！Researcher 在子图里产生的垃圾，被完美地封锁在了子图的生命周期内，随风消逝。Writer 拿到的是纯度极高的 research_summary。

坑与避坑指南

作为你们的导师，我不仅要教你们怎么写代码，更要教你们怎么排错。在实施“状态墙”时，新手最容易踩以下三个坑：

💣 坑一：键名不匹配导致“静默丢失” (Silent Drop)

现象：Researcher 子图明明运行成功了，但 Writer 拿到的 research_summary 却是空的。原因：LangGraph 在子图返回父图时，是严格按照 Key 名称进行更新的。如果子图返回的字典里叫 summary，而全局状态里叫 research_summary，LangGraph 会直接丢弃这个不匹配的键，且不会报错！避坑：务必确保 ResearcherState 中需要穿透的键名，与 AgencyState 中的键名一模一样。

💣 坑二：全局消息列表的无脑 Append

现象：很多人喜欢在全局状态放一个 messages: Annotated[list, add]。然后子图里所有的 LLM 调用也往这个 messages 里塞。结果跑完一圈，全局 messages 膨胀到了 5 万 Token。原因：即便你用了子图，如果你把子图的内部消息也写入了与全局同名的 messages 键，脏数据依然会穿透！避坑：重命名子图的对话历史键。比如全局叫 agency_messages，子图里叫 researcher_internal_messages。提炼节点最后只生成一条干净的 AIMessage，以 {"agency_messages": [clean_msg]} 的形式返回。

💣 坑三：嵌套过深导致调试地狱

现象：为了极致的隔离，把图嵌套了 5 层：Agency -> ResearcherTeam -> WebScraper -> ErrorHandler... 最后报错时，Traceback 长得像清明上河图。原因：过度设计。避坑：事不过三。通常 Global Graph + 1 层 Subgraph 就足以应对 90% 的业务场景。如果还需要更细的隔离，优先考虑普通的 Python 函数内部处理，而不是无脑上 LangGraph 节点。

📝 本期小结

同学们，今天我们完成了一次架构级别的认知升级。

在多智能体系统中，“Agent 之间能看到什么”和“Agent 能做什么”一样重要。 如果不做状态隔离，你的系统就像是一个没有部门划分、所有人在一个大厅里大喊大叫的草台班子。今天我们通过 LangGraph 的 Subgraph（子图）特性，给 Researcher 建了一堵“私有状态墙”。脏活累活在墙内消化，只把最精炼的价值（Summary）穿透回全局状态。

这不仅省下了大笔的 Token 费用，极大降低了幻觉率，更让你的代码具有了企业级的可维护性。

下期预告： 现在我们的 Writer 已经能拿到干净的数据写稿了，但如果它写出来的东西依然是一坨“AI 味”浓重的废话怎么办？第 20 期，我们将为 Editor 引入 Human-in-the-loop（人类在环） 机制。我将教你们如何让 LangGraph 在关键节点“暂停”，等待老板（你）的批示后再继续执行。

大家下课后务必把今天的代码敲一遍。我们下期见！散会！

← 上一课时 lesson-18

下一课时 → lesson-20