Knowledge Base: RAG & Document Management
title: "Lesson 13 | Knowledge Base: RAG & Document Management" summary: "Integrate vector databases and use RAG (Retrieval-Augmented Generation) to build private knowledge bases, enhancing the factual accuracy of AI responses." sortOrder: 130 status: "published"
Alright, this is the 13th article written for the Hermes Agent tutorial.
Subtitle: Integrate vector databases and use RAG (Retrieval-Augmented Generation) to build private knowledge bases, enhancing the factual accuracy of AI responses.
Learning Objectives
In this lesson, you will dive deep into a core advanced feature of Hermes Agent: knowledge base management. Upon completing this chapter, you will be able to:
- Understand How RAG Works: Grasp the core concept of Retrieval-Augmented Generation (RAG) and understand why it is a key technology for overcoming the knowledge limitations of Large Language Models (LLMs).
- Configure a Vector Database: Learn how to configure and integrate a Vector Database, such as ChromaDB, in Hermes Agent to serve as persistent storage for knowledge.
- Create and Manage Knowledge Bases: Use the Hermes Agent CLI (Command-Line Interface) to create, index, and manage your private document collections, building one or more specialized knowledge bases.
- Empower an Agent: Associate the created knowledge bases with a specific Agent instance, enabling it to answer questions based on your private data, significantly improving the accuracy and relevance of its responses.
- Validate the Effectiveness of RAG: Through practical conversation, compare the Agent's responses before and after enabling the knowledge base to directly observe the capability leap brought by RAG.
Core Concepts Explained
Before diving into the hands-on practice, we must understand the core technology driving the knowledge base feature—RAG—and its place within the Hermes Agent architecture.
1. The LLM Dilemma: "Static" and "Forgotten" Knowledge
Standard Large Language Models, like GPT-4 or Llama 3, despite their power, have two inherent flaws:
- Knowledge Cutoff: The model's knowledge is "frozen" at the point in time when its training data was collected. It is unaware of news, product releases, or updated internal documents that occurred afterward.
- Hallucination: When asked about information that is uncertain or non-existent in its knowledge base, an LLM may "fabricate" answers that sound plausible but are entirely incorrect.
- Lack of Private Context: The model knows nothing about your company's internal processes, project documents, personal notes, or other private information. Asking it directly about these topics will not yield useful answers.
To overcome these issues, the industry has proposed various solutions, and RAG (Retrieval-Augmented Generation) has proven to be one of the most efficient and flexible.
2. Analyzing the RAG (Retrieval-Augmented Generation) Workflow
The core idea of RAG is very intuitive: Instead of relying solely on the model's "memory," let it "consult reference materials" before answering a question. These "materials" are the private knowledge base we prepare for it.
The entire RAG process can be divided into two stages: Indexing and Retrieval & Generation.
Stage One: Data Indexing (Offline Processing)
The goal of this stage is to convert your documents (e.g., .txt, .md, .pdf files) into a format that can be efficiently retrieved by a machine.
- Load: The system reads the document sources you specify.
- Chunk/Split: Long documents are split into smaller, meaningful text blocks (Chunks). Why split?
- LLMs have a limited Context Window and cannot process an entire book at once.
- Smaller chunks provide more precise retrieval results, avoiding interference from irrelevant information.
- Embed: An Embedding model is used to convert each text chunk into a high-dimensional vector. This vector can be seen as the mathematical coordinate of the text chunk in a "semantic space." Text chunks that are semantically similar will have vectors that are closer in this space.
- Store: All text chunks and their corresponding vectors are stored in a specialized database—a Vector Database. This database is highly optimized for performing extremely fast vector similarity searches.
Stage Two: Retrieval & Generation (Online Processing)
When a user asks the Agent a question, the system performs the following steps:
- Embed Query: The same Embedding model used in the indexing stage converts the user's query into a vector.
- Retrieve: The system performs a similarity search in the vector database using the user's query vector as a reference, finding the Top-K text chunks that are most semantically relevant to the question (e.g., the 3 most relevant).
- Augment: These retrieved text chunks are combined with the user's original question as "Context" to form a new, more comprehensive prompt.
- Original Prompt:
"Who is the project lead for Project Phoenix?" - Augmented Prompt:
Please answer the question based on the following context. If the information is insufficient, say you don't know. Context: - "Document A Snippet: ...The project manager for Project Phoenix is Alice Johnson, and the project's goal is to complete it by Q4 2024..." - "Document B Snippet: ...For questions about the project budget, please contact Alice Johnson..." Question: Who is the project lead for Project Phoenix?
- Original Prompt:
- Generate: This augmented prompt is sent to the LLM. With clear and relevant context, the LLM can now act as if it's taking an "open-book exam" and accurately answer, "The project lead for Project Phoenix is Alice Johnson."
3. The RAG Architecture in Hermes Agent
Hermes Agent elegantly encapsulates the complex RAG process, allowing developers to easily build and use knowledge bases through the config.yml file and CLI tools.
- Knowledge Component: Hermes Agent has a built-in knowledge base management module responsible for handling document loading, chunking, embedding, and storage.
- Vector DB Provider: Through the configuration file, you can specify the type of vector database to use. Hermes Agent supports various implementations, such as the local file-based database
ChromaDB(ideal for rapid prototyping and personal use) and the server-based databaseQdrant(suitable for production environments). - Embedding Provider: Also specified in the configuration, you can choose to use OpenAI's
text-embedding-ada-002or other locally deployed Embedding models to generate vectors. - CLI Tools: The
hermes knowledgecommand set provides convenient operations for creating, adding documents to, listing, and deleting knowledge bases.
Now, let's experience all of this through a hands-on demonstration.
💻 Practical Demo
We will build an internal knowledge base about "Project Phoenix" and turn our Agent into an expert on this project.
Step 1: Environment Setup
First, ensure you have Hermes Agent installed (refer to Lesson 01). For this demonstration, we will use ChromaDB as our vector database because it doesn't require a separate service installation and can be used directly as a pip package.
Open your terminal, activate your Hermes Agent Python environment, and install the chromadb dependency.
# Activate your Python environment (e.g., venv or conda)
# source .venv/bin/activate
pip install chromadb
Step 2: Configure the Vector Database and Embedding Model
Open the core configuration file of Hermes Agent, config.yml. We need to configure two sections: embedding and vector_db.
We will use OpenAI's Embedding model and specify ChromaDB as the vector database, with data stored locally in the ./data/chroma_db directory.
# config.yml
# ... other configurations ...
# Embedding Provider Configuration
embedding:
provider: openai
# Ensure your OPENAI_API_KEY environment variable is set
# model: text-embedding-3-small # or another embedding model
# Vector Database Configuration
vector_db:
provider: chromadb
# Path for ChromaDB data persistence
path: ./data/chroma_db
# ... agents and other configurations ...
Explanation:
embedding.provider: openai: Tells Hermes Agent to use the OpenAI API to generate text vectors. Please ensure yourOPENAI_API_KEYenvironment variable is set correctly.vector_db.provider: chromadb: Specifies the use of ChromaDB.vector_db.path: Specifies the storage location for the ChromaDB database files. Hermes Agent will automatically create this directory upon startup.
Step 3: Prepare the Knowledge Base Documents
Create a directory to store the source files for your knowledge base.
mkdir -p my_knowledge/project_phoenix
Now, create two markdown files in this directory to simulate project documentation.
File 1: my_knowledge/project_phoenix/overview.md
# Project Phoenix Overview
Project Phoenix is an internal initiative to refactor the company's core trading system.
- **Project Code**: Phoenix
- **Project Lead**: Alice Johnson
- **Tech Stack**: Python, Go, Kubernetes
- **Expected Launch Date**: Q4 2024
File 2: my_knowledge/project_phoenix/qa.md
# Project Phoenix FAQ
**Q: What are the main risks of the project?**
A: The main risks are the complexity of data migration and compatibility issues with legacy systems. The team is developing a detailed rollback plan.
**Q: How can I get project status updates?**
A: Alice Johnson posts a weekly report every Friday at 3 PM in the #project-phoenix channel.
These files contain information that a public LLM would not know, making them perfect test cases to verify if our knowledge base is working.
Step 4: Create and Index the Knowledge Base
Now, let's use the Hermes Agent CLI to process these documents.
Create a new knowledge base
Run the following command in your terminal to create a knowledge base named
project_phoenix_kb.hermes knowledge create --name project_phoenix_kb --description "All internal documents related to Project Phoenix."You will see a success message.
--nameis the unique identifier for the knowledge base, and--descriptionis a human-readable description.Add documents to the knowledge base
Next, we will add all the documents from the
my_knowledge/project_phoenixdirectory we just created to the knowledge base.hermes knowledge add --name project_phoenix_kb --path ./my_knowledge/project_phoenix --recursiveCommand Breakdown:
--name project_phoenix_kb: Specifies which knowledge base to add documents to.--path ./my_knowledge/project_phoenix: Specifies the path to the documents.--recursive: If the directory contains subdirectories, this option will recursively add all files.
After running this command, Hermes Agent will execute the data indexing process we discussed earlier in the background:
- Load the
overview.mdandqa.mdfiles. - Split the file content into suitable text chunks.
- Call the OpenAI API to convert each text chunk into an Embedding vector.
- Store the text chunks and vectors in
ChromaDB.
(Optional) View existing knowledge bases
You can use the
listcommand at any time to see all knowledge bases in the system.hermes knowledge list
Step 5: Enable the Knowledge Base in an Agent
The knowledge base is ready, but by default, the Agent won't use it. We need to explicitly associate the knowledge base with an Agent.
Edit config.yml again. We'll create a new Agent named phoenix_expert and configure the knowledge_bases field for it.
# config.yml
# ... other configurations ...
agents:
- name: default
# ... configuration for the default agent ...
- name: phoenix_expert
# Use the same LLM provider as default, or customize
provider: openai_chat
# Enable RAG functionality
enable_rag: true
# Associate the knowledge base we just created
knowledge_bases:
- project_phoenix_kb
# You can set a specific system prompt for this Agent
system_prompt: "You are an expert on Project Phoenix. Please answer questions accurately based on the provided context."
Key Configurations:
enable_rag: true: Explicitly enables the RAG feature for this Agent.knowledge_bases: ["project_phoenix_kb"]: This is a list that binds theproject_phoenix_kbknowledge base to thephoenix_expertAgent. An Agent can be associated with multiple knowledge bases simultaneously.
Step 6: Test and Validate
Everything is set! Now, start Hermes Agent and witness the magic.
hermes start
Once the service is running, you can test it through any configured message gateway (like the Web UI, Telegram, etc.).
Test Scenario 1: Control Experiment (Ask the default Agent)
First, ask a question to the default Agent, which is not configured with the knowledge base.
You:
Who is the project lead for Project Phoenix?
defaultAgent:I'm sorry, but I don't have access to real-time or private project information like the specific lead for "Project Phoenix." This information is likely internal to an organization.
This response is exactly as expected. The default Agent has no private knowledge about this project.
Test Scenario 2: Validate the Knowledge Base (Ask the expert Agent)
Now, switch to our finely-tuned phoenix_expert Agent and ask the same question.
You:
@phoenix_expert Who is the project lead for Project Phoenix?
phoenix_expertAgent:The project lead for Project Phoenix is Alice Johnson.
Success! The Agent accurately retrieved the information from our overview.md file and provided the correct answer.
Let's try a more complex question.
You:
@phoenix_expert When is the project expected to launch and how can I get status updates?
phoenix_expertAgent:Project Phoenix is expected to launch in Q4 2024. You can get status updates from the weekly report published by Alice Johnson every Friday at 3 PM in the #project-phoenix channel.
This answer perfectly combines information from two different documents: overview.md (launch date) and qa.md (how to get updates). This fully demonstrates the effectiveness of the RAG process: the system retrieved multiple text chunks most relevant to the question and provided them all to the LLM as context, resulting in a comprehensive and accurate response.
Commands Used
Here is a summary of the core commands used in this tutorial:
- Install dependency:
pip install chromadb - Create a knowledge base:
hermes knowledge create --name <knowledge_base_name> --description "<description>" - Add documents to a knowledge base:
hermes knowledge add --name <knowledge_base_name> --path <path_to_docs> [--recursive] - List all knowledge bases:
hermes knowledge list - Delete a knowledge base (use with caution):
hermes knowledge delete --name <knowledge_base_name> - Start the Hermes Agent service:
hermes start
Key Takeaways
- RAG is Key: RAG provides LLMs with external, dynamic, and private knowledge sources through a "retrieve-augment-generate" model, effectively solving the problems of knowledge cutoff and hallucination.
- Two-Stage Process: The RAG workflow is divided into an offline data indexing phase (load, chunk, embed, store) and an online retrieval and generation phase (embed query, retrieve, augment, generate).
- Configuration-Driven: In Hermes Agent, the knowledge base feature is highly configurable. You only need to declare the providers for
embeddingandvector_dbinconfig.yml. - CLI Management: Hermes Agent provides a powerful
hermes knowledgecommand-line tool, making the creation, document addition, and lifecycle management of knowledge bases simple and intuitive. - Agent Binding: A knowledge base must be associated with a specific Agent instance via the
knowledge_basesfield to be used by that Agent, offering great flexibility. - Significant Impact: Through hands-on comparison, we can clearly see that an Agent with RAG enabled far surpasses a generic LLM in factual accuracy and informational relevance when handling domain-specific or private questions.
By mastering the knowledge base feature of Hermes Agent, you have unlocked the core capability to build truly useful, expert-level AI assistants. Whether you are creating an intelligent Q&A bot for your enterprise, a personal knowledge management assistant, or a domain-specific customer service agent, RAG will be one of the most powerful tools in your arsenal.
References
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - The original RAG paper
- Hermes Agent Official Documentation - (Hypothetical) Hermes Agent official knowledge base documentation
- ChromaDB Documentation - ChromaDB official documentation
- What Are Embeddings? - An excellent introductory article on Embeddings