Security Model & Permission Controls

Updated on 4/15/2026

title: "Lesson 10 | Hermes Agent's Security Model & Permission Controls" summary: "Understand trust boundaries, pre-execution confirmation mechanisms, and sandbox isolation strategies to strike a balance between open capabilities and security protection." sortOrder: 100 status: "published"

This is the tenth article in the Hermes Agent tutorial series.


Lesson 10 | Hermes Agent's Security Model & Permission Controls

Subtitle: Understand trust boundaries, pre-execution confirmation mechanisms, and sandbox isolation strategies to strike a balance between open capabilities and security protection.

Learning Objectives

In this lesson, you will delve into the security core of Hermes Agent. After completing this chapter, you will be able to:

  • Understand the Necessity of Agent Security: Recognize the double-edged sword effect of empowering an Agent with execution capabilities and understand why security is a top priority.
  • Master Core Security Concepts: Clearly define and explain the roles of Trust Boundaries, Pre-execution Confirmation, and Sandbox Isolation in an Agent's architecture.
  • Configure Security Policies: Learn to enable or disable key security features by modifying the Hermes Agent's configuration file and adjust them according to actual needs.
  • Practice Sandbox Environment Setup: Understand how to use Docker to create an isolated, controlled sandbox environment for the Agent's code execution, following the Principle of Least Privilege.
  • Make Informed Trade-offs: Make reasonable decisions and strike a balance between the Agent's autonomy, capability openness, and system security.

Core Concepts Explained

As we have continuously unlocked the powerful capabilities of Hermes Agent in previous lessons—from calling external tools (Skills) to possessing long-term memory (Memory)—a crucial question emerges: How do we ensure that this increasingly powerful AI entity is secure, controllable, and trustworthy?

When an Agent can generate code, execute system commands, access the file system, and call external APIs, it is no longer just a chatbot. It becomes an executor with potential real-world agency. This capability brings enormous efficiency gains, but it also introduces risks of the same magnitude. An unintentional incorrect command or a malicious Prompt Injection could lead to data leaks, system damage, or even more severe consequences.

The design philosophy of Hermes Agent is to provide powerful functionality while incorporating a layered, configurable security model. This model is primarily composed of the following core concepts.

1. Trust Boundary

A trust boundary is a theoretical line that separates trusted components from untrusted ones within a system. In the world of Hermes Agent, this boundary is paramount.

  • Inside the Boundary (Trusted):

    • Hermes Agent Core Framework: The main Agent program code that we write and deploy is trusted.
    • Configuration Files: Files like config.yml and skills.yml, which we define ourselves, are trusted.
    • System Environment: The server or local machine running the Agent is trusted.
  • Outside the Boundary (Untrusted):

    • Large Language Model (LLM) Output: This is the most critical untrusted source. Any content generated by the LLM, whether it's a natural language Chain of Thought or a code snippet (like Python or Shell) ready for execution, must absolutely not be trusted unconditionally. It could produce buggy, inefficient, or even malicious code.
    • User Input: A user's prompt might contain unintentional errors or malicious instructions attempting to bypass security constraints (i.e., a Prompt Injection attack).
    • External Tool/API Return Values: Data returned from external APIs or tools called by the Agent should also be considered untrusted and requires proper validation and sanitization.

All security mechanisms in Hermes Agent are built on the fundamental principle of "never blindly trust any input from outside the boundary."

2. Pre-execution Confirmation

This is the first and most intuitive line of defense. It is a "Human-in-the-Loop" security mechanism.

When the Agent's Planner decides to perform an action with potential side effects (e.g., executing a piece of code, calling an API that modifies data, running a shell command), it does not execute it immediately. Instead, it presents the complete execution plan and the specific code/command to the user and pauses, awaiting explicit authorization.

The workflow is as follows:

  1. The Agent receives a task, thinks, and plans.
  2. The Agent generates one or more specific operational steps, such as a Python code snippet.
  3. Before executing this code, the Agent outputs the following in the console or messaging gateway:
    • Intent: "I am going to download the webpage content and analyze the title."
    • Code/Command: requests.get('...'), os.makedirs(...), etc.
    • Confirmation Prompt: [y/n]
  4. Only when the user types y (yes) and presses Enter does the Agent proceed with execution. If the user types n (no) or anything else, the operation is aborted.

This mechanism is simple yet extremely effective. It returns the final execution authority to a human supervisor, preventing the Agent from "going rogue," especially during development and debugging.

3. Sandbox Isolation

If pre-execution confirmation is a policy-level defense, then sandbox isolation is a hard, technical-level isolation. Even if we approve the execution of a piece of code, we don't want it to have free rein on our main system.

A sandbox is a restricted, isolated execution environment. A program running inside a sandbox has its capabilities strictly controlled. It cannot access the file system, network, or processes outside the sandbox. It's like testing a potentially dangerous device in a locked, bulletproof room.

For a system like Hermes Agent that needs to execute LLM-generated code, a sandbox is essential. One of the most common and powerful sandboxing technologies is Docker containers.

Advantages of using Docker as a sandbox:

  • File System Isolation: The container has its own independent file system. We can precisely control which host directories are mounted to which locations inside the container, preventing the Agent from accessing or modifying sensitive files on the host (like /etc/passwd or ~/.ssh).
  • Network Isolation: Containers can be configured with independent network modes. For example, network access can be completely disabled or restricted to specific IPs and ports, effectively preventing code from leaking data or downloading malware from unknown sources.
  • Process Isolation: Processes inside the container are isolated from host processes, unable to interfere with or snoop on other applications on the host.
  • Resource Limiting: You can limit the amount of CPU and memory a container can use, preventing a "Code Bomb" from exhausting host resources.
  • Environmental Consistency: It provides a clean, reproducible runtime environment for the Agent's code execution, complete with all necessary dependencies (like requests, pandas).

By delegating all code execution tasks to a temporary, disposable Docker container, we significantly reduce potential risks. Even if the LLM generates malicious code like rm -rf / and the user accidentally confirms it, it will only delete the root directory inside the container, causing no harm to the host system.

4. Principle of Least Privilege (POLP)

This is a golden rule that runs through all security design. It requires that any component (be it the Agent itself or the sandbox executing its code) should only be granted the minimum permissions necessary to complete its task.

  • For the Agent Process: If the Agent doesn't need to write files, don't run it with write permissions.
  • For API Keys: The cloud service API keys (e.g., for AWS, GCP) provided to the Agent should be created with strictly scoped permissions using tools like IAM (Identity and Access Management), rather than using a root key with full access.
  • For the Sandbox Container:
    • Non-Root User: In the Dockerfile, a non-root user should be created, and the USER instruction should be used to switch to that user. This prevents processes inside the container from having excessive privileges.
    • Precise Mounting: Only mount a dedicated working directory (e.g., ./workspace), not the entire project directory or the user's home directory.
    • Restricted Network: If the task does not require internet access, run the container with --network=none.

By combining these four concepts, Hermes Agent builds a defense-in-depth system: on top of a clear trust boundary, it uses pre-execution confirmation for decision-making gatekeeping, then leverages sandbox isolation to limit the blast radius, and consistently follows the principle of least privilege to tighten permissions at every stage.

💻 Hands-on Demo

Next, we will modify the Hermes Agent's configuration file to experience firsthand how these security mechanisms work.

Let's assume our task is: "Please fetch the content of the Hermes Agent's README.md file from GitHub, count how many times the word 'Agent' appears, and save the result to a file named analysis_result.txt."

This task involves:

  1. A network request (accessing GitHub)
  2. File system writing (saving the result)

This is a typical operation that requires security review.

Scenario 1: Fully Open Mode (Dangerous! For demonstration only)

First, let's simulate a completely trusting and highly insecure configuration.

  1. Modify Configuration File Open config/config.yml, find the security section (or add it if it doesn't exist), and configure it as follows:

    # config/config.yml
    
    # ... other configurations ...
    
    security:
      # Whether to require user confirmation before executing code or commands
      require_confirmation: false
      # Execution environment configuration
      execution_environment:
        # 'local' means executing directly on the host machine, which is very dangerous!
        type: "local"
    
  2. Start the Agent and Give the Command

    ./hermes-agent run
    

    In the Agent's interactive interface, enter our task:

    User: Please fetch the content of the Hermes Agent's README.md file from GitHub, count how many times the word 'Agent' appears, and save the result to a file named analysis_result.txt.

  3. Observe the Behavior You will see the Agent think and then start executing without asking for any confirmation. Its logs might show output similar to this:

    [INFO] Planner: I need to use Python to perform this task. I will fetch the URL, count the word, and write to a file.
    [INFO] Executor: Executing the following code directly on the local machine:
    
    import requests
    
    url = "https://raw.githubusercontent.com/hermes-project/hermes-agent/main/README.md"
    try:
        response = requests.get(url)
        response.raise_for_status()
        content = response.text
        count = content.lower().count('agent')
        result_message = f"The word 'Agent' appears {count} times."
        
        with open("analysis_result.txt", "w") as f:
            f.write(result_message)
            
        print(result_message)
        print("Result saved to analysis_result.txt")
        
    except requests.exceptions.RequestException as e:
        print(f"Error fetching URL: {e}")
    
    [INFO] Execution finished.
    The word 'Agent' appears 12 times.
    Result saved to analysis_result.txt
    

    At this point, a new file named analysis_result.txt will appear in your project's root directory. This process is very smooth but also extremely dangerous. If the LLM had generated code like import os; os.system('rm -rf ~'), your user's home directory could have been wiped out.

Scenario 2: Enable Pre-execution Confirmation (Recommended basic security configuration)

Now, let's enable the first line of defense.

  1. Modify Configuration File Change require_confirmation to true.

    # config/config.yml
    security:
      require_confirmation: true # <--- Change to true
      execution_environment:
        type: "local"
    
  2. Execute the Task Again Restart the Agent and enter the same command.

  3. Observe the Behavior This time, the Agent's behavior is completely different. After generating the code, it will stop and wait for your permission:

    [INFO] Planner: I need to use Python to perform this task. I will fetch the URL, count the word, and write to a file.
    
    [CONFIRMATION REQUIRED] The agent plans to execute the following Python code.
    Please review it carefully.
    
    Actions to be taken:
    - Make a network request to https://raw.githubusercontent.com
    - Write to the local file: analysis_result.txt
    
    Code:
    ----------------------------------------------------------------------
    import requests
    
    url = "https://raw.githubusercontent.com/hermes-project/hermes-agent/main/README.md"
    try:
        response = requests.get(url)
        response.raise_for_status()
        content = response.text
        count = content.lower().count('agent')
        result_message = f"The word 'Agent' appears {count} times."
        
        with open("analysis_result.txt", "w") as f:
            f.write(result_message)
            
        print(result_message)
        print("Result saved to analysis_result.txt")
        
    except requests.exceptions.RequestException as e:
        print(f"Error fetching URL: {e}")
    ----------------------------------------------------------------------
    
    Do you approve this action? [y/n]: 
    

    Now you have the opportunity to review the code. You can see that it is indeed performing the task you requested, with no malicious behavior. You then type y and press Enter, and only then will the Agent proceed with execution and complete the task. If you type n, it will abort the operation and inform you that the user denied execution.

Scenario 3: Enable Docker Sandbox Isolation (Highest security level)

Finally, let's enable the ultimate protection: the Docker sandbox.

  1. Prepare the Docker Environment First, ensure you have Docker installed. Then, we need to create a Docker image for code execution.

    In your project's root directory, create a folder named sandbox and create a Dockerfile inside it:

    mkdir sandbox
    cd sandbox
    touch Dockerfile
    

    Edit the Dockerfile with the following content:

    # Dockerfile
    
    # Use a lightweight Python image as the base
    FROM python:3.10-slim
    
    # Install necessary libraries, e.g., our task needs requests
    RUN pip install requests
    
    # --- Principle of Least Privilege in Practice ---
    # Create a working directory
    WORKDIR /app/workspace
    
    # Create an unprivileged, normal user named hermesuser
    RUN useradd -m -d /app hermesuser
    
    # Switch to this non-root user
    USER hermesuser
    
    # By default, the container does nothing upon startup, waiting for the Agent to pass a command
    CMD ["/bin/bash"]
    
  2. Build the Docker Image In the sandbox directory, execute the build command:

    # Execute inside the sandbox folder
    docker build -t hermes/python-sandbox:latest .
    

    This will create an image named hermes/python-sandbox, which we will reference in our configuration.

  3. Modify Configuration File Now, let's configure Hermes Agent to use this Docker sandbox.

    # config/config.yml
    security:
      require_confirmation: true
      execution_environment:
        type: "docker" # <--- Change to docker
        docker:
          # The image we just built
          image: "hermes/python-sandbox:latest"
          
          # The working directory inside the container
          working_dir: "/app/workspace"
          
          # Mount the host's ./workspace directory to the container's working directory
          # This allows the Agent to exchange files with the host, but only within this controlled directory
          volume_mounts:
            - type: bind
              source: "./workspace" # Relative path on the host
              target: "/app/workspace" # Absolute path in the container
              
          # Network mode. 'bridge' is the default, allowing external network access.
          # Can be set to 'none' if the task doesn't need networking.
          network_mode: "bridge"
          
          # Resource limits (optional)
          # mem_limit: "256m"
          # cpus: "0.5"
    

    Note: You need to manually create a workspace folder in your project's root directory to share files with the container.

    # In the project root directory
    mkdir workspace
    
  4. Execute the Task Again Restart the Agent and enter the same command.

  5. Observe the Behavior This time, the process is as follows:

    • The Agent will still ask for your confirmation (because require_confirmation is true).
    • After you type y, the logs will show different information than before:
    [INFO] User approved the action.
    [INFO] Execution Environment: Docker. Preparing container...
    [INFO] Starting Docker container from image 'hermes/python-sandbox:latest'.
    [INFO] Mounting local './workspace' to '/app/workspace' in container.
    [INFO] Executing code inside the Docker sandbox...
    
    [DOCKER_STDOUT] The word 'Agent' appears 12 times.
    [DOCKER_STDOUT] Result saved to analysis_result.txt
    
    [INFO] Execution finished. Container stopped and removed.
    

    Now, the analysis_result.txt file will appear in your host's ./workspace directory, not the project root.

    We have successfully confined the code execution within a Docker container. The code's file system write operations were redirected to the controlled workspace directory. It ran inside the container as the non-root user hermesuser. Even if this code had vulnerabilities or malicious intent, the damage it could cause would be tightly contained within this temporary, isolated container.

Commands Involved

  • mkdir sandbox workspace: Create directories for the sandbox configuration and workspace.
  • touch sandbox/Dockerfile: Create the Dockerfile.
  • docker build -t hermes/python-sandbox:latest .: Build the Docker image for the sandbox.
  • vim config/config.yml: (or use your favorite editor) Modify the Agent's security configuration.
  • ./hermes-agent run: Start the Hermes Agent.

Key Takeaways

  • Security is Not Optional: For Agents capable of executing code, security is a cornerstone of the design.
  • Layered Defense: The Hermes Agent security model is layered, including a policy layer (confirmation mechanism) and a technical layer (sandbox isolation).
  • Never Trust the LLM: LLM output must be treated as untrusted external input, subject to review and isolation.
  • Human-in-the-Loop is Key: require_confirmation: true is the simplest and most effective security switch, ensuring the final decision-making power remains in your hands.
  • Sandboxing is the Ultimate Safeguard: Using technologies like Docker to create a sandbox environment can fundamentally limit the damage potential of malicious code and is a recommended practice for production deployments.
  • Principle of Least Privilege: Whether for the Agent process, API keys, or sandbox configuration, always follow the principle of least privilege, granting only the necessary permissions.
  • The Trade-off Between Security and Convenience: The local mode is the most convenient but most dangerous, while the docker mode is the most secure but requires more complex configuration. You need to choose the appropriate security level based on the use case (development, personal use, production deployment).

Through this lesson, you have mastered the core knowledge and skills to protect your Hermes Agent. In your future explorations, always prioritize security awareness and build and use powerful AI Agents responsibly.

References