第 24 期 | 数据库直连：SQL 客户端检索与分析 MCP — Claude Code 终极大师课

🎯 学习目标

理解 AI 代理直接连接数据库的架构优势与安全考量，特别是针对 PostgreSQL。
掌握配置 PostgreSQL 数据库以供 AI 代理（如 Claude）安全访问的关键步骤，包括用户权限管理。
熟练运用 psql 等 SQL 客户端工具连接、探索和初步分析数据库中的 MCP（Minimum Complete Product）数据。
学习如何利用自然语言提示，引导 Claude 生成高效的 SQL 查询，并进行查询优化与性能分析。

📖 核心概念讲解

嗨，技术伙伴们！欢迎来到 Claude Code 教程系列的第 24 期。今天我们将深入探讨一个在现代 AI 应用中日益重要的主题：AI 代理如何直接与关系型数据库交互。具体来说，我们将聚焦于 PostgreSQL，并演示 Claude 如何利用 SQL 客户端能力，直接检索和分析我们的 MCP（Minimum Complete Product）数据。

24.1 AI 代理直连数据库的架构与考量

在许多企业级应用中，核心业务数据通常存储在关系型数据库（如 PostgreSQL, MySQL, Oracle, SQL Server）中。传统上，AI 应用需要通过 API 或数据湖等中间层来获取数据。然而，直接连接数据库为 AI 代理带来了显著优势：

实时性与数据新鲜度：直接访问确保 AI 代理始终获取到最新、最准确的数据，无需等待数据同步或 ETL 过程。
查询灵活性与复杂性：AI 代理可以直接构建和执行复杂的 SQL 查询，包括多表联接、聚合、子查询等，从而实现更深层次的数据洞察和分析。
减少中间层开销：简化了数据管道，降低了系统复杂性和维护成本。

然而，直接连接也带来了新的挑战，尤其是安全性：

最小权限原则 (Principle of Least Privilege)：AI 代理绝不应使用拥有完全权限的数据库用户。它应该拥有仅满足其任务所需的最小权限，通常是只读权限 (SELECT)。
安全连接：始终使用 SSL/TLS 加密连接数据库，防止数据在传输过程中被窃听。
凭证管理：数据库连接凭证（用户名、密码）必须安全存储，例如使用环境变量、密钥管理服务（KMS）或专门的秘密管理工具，而不是硬编码在代码中。

架构示意图：

+----------------+       +-------------------+       +--------------------+
|  Claude Agent  | <---> | SQL Client/Driver | <---> | PostgreSQL Database|
| (Natural Lang) |       | (psycopg2, psql)  |       | (MCP Data)         |
+----------------+       +-------------------+       +--------------------+
        ^                        |                         ^
        |                        |                         |
        |                        V                         |
        |               +-----------------+                |
        |               |   SQL Query     |                |
        |               | (SELECT, JOIN)  |                |
        |               +-----------------+                |
        |                                                  |
        +--------------------------------------------------+
          (Data Retrieval & Analysis)

在我们的场景中，Claude 将扮演 SQL 客户端的角色，通过其内置的工具集，将自然语言请求转换为具体的 SQL 语句，并执行这些语句以与 PostgreSQL 数据库交互。

24.2 配置 PostgreSQL 以供 Claude 访问

为了让 Claude 能够安全高效地访问我们的 MCP 数据库，我们需要进行一些 PostgreSQL 端的配置。我们将创建一个专用的数据库用户，并为其授予只读权限。

步骤概述：

创建数据库：如果还没有 MCP 数据库，先创建一个。
创建专用用户：为 Claude 创建一个仅用于数据查询的数据库用户。
授予权限：将 SELECT 权限授予该用户，限定其只能读取所需表的数据。
配置连接：确保数据库接受来自 Claude 所在环境的连接。

我们将以一个假设的 MCP 数据库 mcp_db 为例，其中包含 products, customers, orders, order_items 等表。

-- 1. 连接到 PostgreSQL 默认数据库 (例如 postgres) 作为超级用户
-- psql -U postgres

-- 2. 创建或切换到你的 MCP 数据库
-- 如果数据库不存在，则创建
CREATE DATABASE mcp_db;

-- 切换到新创建的数据库
\c mcp_db;

-- 3. 创建一个低权限用户 'claude_read_only'，并设置一个强密码
CREATE USER claude_read_only WITH PASSWORD 'YourStrongAndSecretPasswordHere';

-- 4. 授予该用户连接到 mcp_db 的权限
GRANT CONNECT ON DATABASE mcp_db TO claude_read_only;

-- 5. 授予该用户对所有现有表和未来表的 SELECT 权限
-- 这一步很重要，因为 Claude 可能需要查询任何表
GRANT SELECT ON ALL TABLES IN SCHEMA public TO claude_read_only;
ALTER DEFAULT PRIVILEGES FOR USER postgres IN SCHEMA public
GRANT SELECT ON TABLES TO claude_read_only;

-- 6. （可选但推荐）授予对现有序列和未来序列的 USAGE 权限，以防需要查看序列状态
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO claude_read_only;
ALTER DEFAULT PRIVILEGES FOR USER postgres IN SCHEMA public
GRANT USAGE ON SEQUENCES TO claude_read_only;

-- 7. 验证用户权限 (以 claude_read_only 身份尝试连接和查询)
-- 退出当前 psql 会话
-- psql -U claude_read_only -d mcp_db
-- SELECT * FROM products LIMIT 1; -- 应该成功
-- INSERT INTO products (name) VALUES ('test'); -- 应该失败 (权限拒绝)

连接字符串示例： AI 代理通常使用连接字符串来连接数据库。对于 PostgreSQL，典型的连接字符串格式如下： postgresql://claude_read_only:YourStrongAndSecretPasswordHere@localhost:5432/mcp_db

在实际应用中，YourStrongAndSecretPasswordHere 应该通过环境变量或其他安全机制提供。

24.3 SQL 客户端检索与查询优化基础

掌握基本的 SQL 客户端操作和查询优化技巧对于与数据库交互至关重要，即使是 AI 代理，也需要这些基础知识来构建和优化查询。

使用 psql 客户端：

psql 是 PostgreSQL 官方提供的命令行客户端，功能强大，是数据库管理员和开发者的瑞士军刀。

# 连接到数据库
psql -U claude_read_only -d mcp_db -h localhost -p 5432

# 连接后的一些常用命令：
\l             # 列出所有数据库
\c <database>  # 连接到指定数据库
\dt            # 列出当前数据库中的表
\d <table_name> # 描述表的结构，包括列、类型、索引等
\timing        # 开启/关闭查询执行时间显示
\q             # 退出 psql

MCP 数据模型示例： 为了演示，我们假设 mcp_db 数据库中有以下表结构：

products
- product_id (PK)
- name
- description
- price
- stock_quantity
- category
- created_at
customers
- customer_id (PK)
- first_name
- last_name
- email
- registration_date
orders
- order_id (PK)
- customer_id (FK to customers)
- order_date
- total_amount
- status (pending, completed, cancelled)
order_items
- order_item_id (PK)
- order_id (FK to orders)
- product_id (FK to products)
- quantity
- unit_price

基本查询示例：

-- 查询所有库存大于100的电子产品
SELECT product_id, name, price, stock_quantity
FROM products
WHERE category = 'Electronics' AND stock_quantity > 100
ORDER BY price DESC;

-- 查找所有已完成订单的客户姓名和订单总金额
SELECT c.first_name, c.last_name, o.total_amount, o.order_date
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
ORDER BY o.order_date DESC
LIMIT 10;

-- 计算每个产品类别的平均价格和总库存
SELECT category, AVG(price) AS average_price, SUM(stock_quantity) AS total_stock
FROM products
GROUP BY category
ORDER BY total_stock DESC;

查询优化与 EXPLAIN ANALYZE：

当查询执行缓慢时，我们需要分析其执行计划。EXPLAIN ANALYZE 是一个强大的工具，它不仅显示查询的执行计划，还会实际执行查询并报告每个步骤的实际时间和行数。

-- 示例：分析一个查询的性能
EXPLAIN ANALYZE
SELECT c.first_name, c.last_name, SUM(oi.quantity * oi.unit_price) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date >= '2023-01-01' AND o.order_date < '2024-01-01'
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY total_spent DESC
LIMIT 5;

EXPLAIN ANALYZE 的输出会告诉你数据库是如何执行查询的，包括使用了哪些索引、扫描了多少行、连接方式等。通过分析这些信息，你可以识别性能瓶颈，例如：

全表扫描 (Full Table Scan)：如果在一个大表上没有使用索引而进行了全表扫描，性能会很差。
不必要的连接 (Unnecessary Joins)：连接了过多不必要的表会增加查询开销。
低效的索引使用：索引可能存在，但查询优化器没有选择使用，或者索引不够 B-tree 友好。

优化策略：

创建合适的索引：在 WHERE 子句、JOIN 条件和 ORDER BY 子句中经常使用的列上创建索引。例如，对于上面的查询，orders.order_date、customers.customer_id、orders.order_id、order_items.product_id 都是潜在的索引候选。
优化 JOIN 顺序：数据库优化器通常会选择最佳 JOIN 顺序，但在某些复杂查询中，手动调整可能会有帮助。
避免 SELECT *：只选择你需要的列，减少数据传输量。
合理使用 LIMIT 和 OFFSET：尤其是在分页查询中。
分区 (Partitioning)：对于超大表，可以考虑分区以提高查询效率和管理性。

通过本期学习，你将能够指导 Claude 不仅生成 SQL，还能理解并提出优化建议。

💻 模拟 Claude TUI 交互

以下是 Claude 在终端中与数据库交互的模拟场景。我们将看到 Claude 如何连接数据库、查询数据，并对查询进行优化。

场景一：连接数据库并发现 MCP 模式

用户要求 Claude 连接到 MCP 数据库并了解其结构。

> Claude，请连接到我们的 PostgreSQL MCP 数据库，连接字符串是 "postgresql://claude_read_only:YourStrongAndSecretPasswordHere@localhost:5432/mcp_db"。连接成功后，请列出所有可用的表及其主要列。

╭─ Claude ──────────────────────────────╮
│  好的，我正在尝试连接到 PostgreSQL 数据库并获取模式信息。               │
│                                                                        │
│  **Tool Call:** `db_connect`                                           │
│  **Parameters:**                                                       │
│  ```json                                                               │
│  {                                                                     │
│    "connection_string": "postgresql://claude_read_only:YourStrongAndSecretPasswordHere@localhost:5432/mcp_db" │
│  }                                                                     │
│  ```                                                                   │
│                                                                        │
│  **Tool Output:**                                                      │
│  ```json                                                               │
│  {                                                                     │
│    "status": "success",                                                │
│    "message": "Successfully connected to mcp_db."                      │
│  }                                                                     │
│  ```                                                                   │
│                                                                        │
│  数据库连接成功。现在我将获取数据库模式信息。                         │
│                                                                        │
│  **Tool Call:** `db_schema_info`                                       │
│  **Parameters:** {}                                                    │
│  ```                                                                   │
│                                                                        │
│  **Tool Output:**                                                      │
│  ```json                                                               │
│  {                                                                     │
│    "status": "success",                                                │
│    "schema": {                                                         │
│      "products": {                                                     │
│        "columns": [                                                    │
│          {"name": "product_id", "type": "INTEGER", "is_pk": true},    │
│          {"name": "name", "type": "VARCHAR"},                          │
│          {"name": "description", "type": "TEXT"},                      │
│          {"name": "price", "type": "NUMERIC"},                         │
│          {"name": "stock_quantity", "type": "INTEGER"},                │
│          {"name": "category", "type": "VARCHAR"},                      │
│          {"name": "created_at", "type": "TIMESTAMP"}                   │
│        ]                                                               │
│      },                                                                │
│      "customers": {                                                    │
│        "columns": [                                                    │
│          {"name": "customer_id", "type": "INTEGER", "is_pk": true},    │
│          {"name": "first_name", "type": "VARCHAR"},                    │
│          {"name": "last_name", "type": "VARCHAR"},                     │
│          {"name": "email", "type": "VARCHAR"},                         │
│          {"name": "registration_date", "type": "DATE"}                 │
│        ]                                                               │
│      },                                                                │
│      "orders": {                                                       │
│        "columns": [                                                    │
│          {"name": "order_id", "type": "INTEGER", "is_pk": true},      │
│          {"name": "customer_id", "type": "INTEGER", "is_fk": true, "references": "customers"}, │
│          {"name": "order_date", "type": "TIMESTAMP"},                  │
│          {"name": "total_amount", "type": "NUMERIC"},                  │
│          {"name": "status", "type": "VARCHAR"}                         │
│        ]                                                               │
│      },                                                                │
│      "order_items": {                                                  │
│        "columns": [                                                    │
│          {"name": "order_item_id", "type": "INTEGER", "is_pk": true}, │
│          {"name": "order_id", "type": "INTEGER", "is_fk": true, "references": "orders"}, │
│          {"name": "product_id", "type": "INTEGER", "is_fk": true, "references": "products"}, │
│          {"name": "quantity", "type": "INTEGER"},                      │
│          {"name": "unit_price", "type": "NUMERIC"}                     │
│        ]                                                               │
│      }                                                                 │
│    }                                                                   │
│  ```                                                                   │
│                                                                        │
│  我已经成功连接到数据库 `mcp_db`。以下是找到的表及其主要列：           │
│                                                                        │
│  *   **products**: `product_id` (PK), `name`, `description`, `price`, `stock_quantity`, `category`, `created_at` │
│  *   **customers**: `customer_id` (PK), `first_name`, `last_name`, `email`, `registration_date` │
│  *   **orders**: `order_id` (PK), `customer_id` (FK), `order_date`, `total_amount`, `status` │
│  *   **order_items**: `order_item_id` (PK), `order_id` (FK), `product_id` (FK), `quantity`, `unit_price` │
│                                                                        │
│  请问您现在想查询什么数据？                                            │
╰───────────────────────────────────────╯

场景二：自然语言查询与数据检索

用户要求查询特定数据，Claude 将其转换为 SQL 并执行。

> 好的，Claude。请帮我找出在过去 30 天内，销量最高的前 5 个产品（按销售数量计算），并显示它们的产品名称和总销量。

╭─ Claude ──────────────────────────────╮
│  好的，我将查询过去 30 天内销量最高的前 5 个产品。                     │
│                                                                        │
│  **Tool Call:** `db_query`                                             │
│  **Parameters:**                                                       │
│  ```json                                                               │
│  {                                                                     │
│    "sql_query": "SELECT p.name AS product_name, SUM(oi.quantity) AS total_quantity_sold\nFROM products p\nJOIN order_items oi ON p.product_id = oi.product_id\nJOIN orders o ON oi.order_id = o.order_id\nWHERE o.order_date >= NOW() - INTERVAL '30 days'\nGROUP BY p.product_id, p.name\nORDER BY total_quantity_sold DESC\nLIMIT 5;" │
│  }                                                                     │
│  ```                                                                   │
│                                                                        │
│  **Tool Output:**                                                      │
│  ```json                                                               │
│  {                                                                     │
│    "status": "success",                                                │
│    "results": [                                                        │
│      {"product_name": "Premium Wireless Headset", "total_quantity_sold": 150}, │
│      {"product_name": "Smart Home Assistant", "total_quantity_sold": 120}, │
│      {"product_name": "Ergonomic Office Chair", "total_quantity_sold": 95}, │
│      {"product_name": "Portable Power Bank 10000mAh", "total_quantity_sold": 88}, │
│      {"product_name": "Ultra HD 4K Monitor", "total_quantity_sold": 72}   │
│    ],                                                                  │
│    "columns": ["product_name", "total_quantity_sold"],                 │
│    "row_count": 5,                                                      │
│    "execution_time_ms": 15.7                                           │
│  }                                                                     │
│  ```                                                                   │
│                                                                        │
│  在过去 30 天内，销量最高的前 5 个产品及其总销量如下：                 │
│                                                                        │
│  1.  **Premium Wireless Headset**: 150 单位                           │
│  2.  **Smart Home Assistant**: 120 单位                               │
│  3.  **Ergonomic Office Chair**: 95 单位                              │
│  4.  **Portable Power Bank 10000mAh**: 88 单位                        │
│  5.  **Ultra HD 4K Monitor**: 72 单位                                 │
│                                                                        │
│  该查询耗时 15.7 毫秒。您对结果满意吗？或者想进行进一步的分析或优化？ │
╰───────────────────────────────────────╯

场景三：查询优化与性能分析

用户要求 Claude 优化之前的查询并解释其执行计划。

> 这个查询的速度还不错，但我想知道它是否有优化空间。请帮我分析一下刚才那个查询的执行计划，并建议如何优化。

╭─ Claude ──────────────────────────────╮
│  好的，我将使用 EXPLAIN ANALYZE 来分析之前的查询，并评估其性能瓶颈。   │
│                                                                        │
│  **Tool Call:** `db_explain_query`                                     │
│  **Parameters:**                                                       │
│  ```json                                                               │
│  {                                                                     │
│    "sql_query": "SELECT p.name AS product_name, SUM(oi.quantity) AS total_quantity_sold\nFROM products p\nJOIN order_items oi ON p.product_id = oi.product_id\nJOIN orders o ON oi.order_id = o.order_id\nWHERE o.order_date >= NOW() - INTERVAL '30 days'\nGROUP BY p.product_id, p.name\nORDER BY total_quantity_sold DESC\nLIMIT 5;" │
│  }                                                                     │
│  ```                                                                   │
│                                                                        │
│  **Tool Output:**                                                      │
│  ```text                                                               │
│                                  QUERY PLAN                                   │
│  -----------------------------------------------------------------------------│
│   Limit  (cost=125.75..125.76 rows=5 width=44) (actual time=14.582..14.584 rows=5 loops=1) │
│     ->  Sort  (cost=125.75..126.96 rows=485 width=44) (actual time=14.581..14.582 rows=5 loops=1) │
│           Sort Key: (sum(oi.quantity)) DESC                                  │
│           Sort Method: top-N heapsort  Memory: 25kB                          │
│           ->  HashAggregate  (cost=100.86..105.71 rows=485 width=44) (actual time=14.471..14.545 rows=485 loops=1) │
│                 Group Key: p.product_id, p.name                              │
│                 Batches: 1  Memory Usage: 89kB                               │
│                 ->  Hash Join  (cost=26.00..88.60 rows=2452 width=36) (actual time=1.864..10.595 rows=2452 loops=1) │
│                       Hash Cond: (oi.order_id = o.order_id)                  │
│                       ->  Hash Join  (cost=1.00..60.00 rows=2452 width=16) (actual time=0.045..5.600 rows=2452 loops=1) │
│                             Hash Cond: (oi.product_id = p.product_id)        │
│                             ->  Seq Scan on order_items oi  (cost=0.00..30.00 rows=10000 width=8) (actual time=0.002..1.500 rows=10000 loops=1) │
│                             ->  Hash  (cost=1.00..20.00 rows=500 width=12) (actual time=0.040..0.040 rows=500 loops=1) │
│                                   Buckets: 1024  Batches: 1  Memory Usage: 24kB │
│                                   ->  Seq Scan on products p  (cost=0.00..20.00 rows=500 width=12) (actual time=0.001..0.025 rows=500 loops=1) │
│                       ->  Hash  (cost=20.00..20.00 rows=500 width=20) (actual time=1.800..1.800 rows=500 loops=1) │
│                             Buckets: 1024  Batches: 1  Memory Usage: 28kB    │
│                             ->  Seq Scan on orders o  (cost=0.00..20.00 rows=500 width=20) (actual time=0.001..1.750 rows=500 loops=1) │
│                                   Filter: (order_date >= (now() - '30 days'::interval)) │
│                                   Rows Removed by Filter: 9500                 │
│  Planning Time: 0.850 ms                                                       │
│  Execution Time: 14.700 ms                                                     │
│  ```                                                                   │
│                                                                        │
│  我已分析了查询的执行计划。从 `EXPLAIN ANALYZE` 的输出看，主要瓶颈可能在于： │
│                                                                        │
│  1.  **`Seq Scan on orders o`**: `orders` 表上进行了全表扫描 (`Seq Scan`)，并且通过 `Filter: (order_date >= (now() - '30 days'::interval))` 过滤掉了大量数据 (`Rows Removed by Filter: 9500`)。这意味着在 `order_date` 列上没有有效索引被使用，导致数据库不得不检查每一条订单记录。 │
│  2.  **`Hash Join` 操作**: 多个 `Hash Join` 发生在 `order_items`、`products` 和 `orders` 表之间，虽然对于中等规模的数据集哈希连接通常效率不错，但如果底层扫描效率低下，整体性能也会受影响。 │
│                                                                        │
│  **优化建议：**                                                        │
│                                                                        │
│  我强烈建议在 `orders` 表的 `order_date` 列上创建一个 B-tree 索引。这将使数据库能够快速定位到过去 30 天内的订单，显著减少 `Seq Scan` 的开销，并可能改变后续的连接策略。 │
│                                                                        │
│  ```sql                                                               │
│  CREATE INDEX idx_orders_order_date ON orders (order_date);            │
│  ```                                                                   │
│                                                                        │
│  您可以尝试创建此索引，然后重新运行