Lesson 18 | Production Deployment: VPS to GPU Clusters

20 MIN READ | UPDATED: 2026-05-07

title: "Lesson 18 | Production Deployment: VPS to GPU Clusters" summary: "Industrial-grade deployment solutions: from cloud server selection and cluster orchestration to enterprise intranet deployment and high-availability strategy design." sortOrder: 180 status: "published"

Lesson 18 | Production Deployment: VPS to GPU Clusters

Subtitle: Industrial-grade deployment solutions, covering cloud server selection, cluster orchestration, enterprise intranet deployment, and the design of high-availability and disaster recovery strategies.

Welcome to the 18th lesson in the Hermes Agent tutorial series. In previous lessons, we have explored the core functionalities of the Hermes Agent in-depth, from model configuration and skill extension to the memory system. Now, it's time to push our agent from the development environment into the vast world of production. This lesson will focus on the industrial-grade deployment of the Hermes Agent, ensuring it runs stably and efficiently in high-concurrency, high-availability production environments.

Learning Objectives

Upon completing this lesson, you will be able to:

Evaluate and select suitable production deployment environments: From simple VPS to complex GPU clusters, understand the pros, cons, and applicable scenarios for different solutions.
Master containerized deployment: Use Docker and Docker Compose to package the Hermes Agent and its dependencies, achieving environmental consistency and rapid deployment.
Implement Kubernetes cluster orchestration: Deploy the Hermes Agent to a Kubernetes (K8s) cluster to achieve auto-scaling, self-healing, and rolling updates.
Address the challenges of enterprise intranet deployment: Understand the special considerations for deploying an Agent in an isolated network, such as proxy configuration and private dependency management.
Design and implement High Availability (HA) strategies: Ensure the robustness of the Agent service, eliminate single points of failure, and create a basic disaster recovery plan.

Core Concepts Explained

Before diving into the hands-on section, we must understand several key DevOps and MLOps concepts. These are the cornerstones of building stable, scalable production systems.

1. Choosing a Deployment Environment

VPS (Virtual Private Server): The most basic cloud server. It provides an isolated virtual environment with limited resources.
- Pros: Low cost, simple configuration, suitable for personal projects, prototype validation, or low-traffic applications.
- Cons: Single Point of Failure (SPOF)—if the server goes down, the service is interrupted; scaling usually requires manual intervention and may cause downtime; limited resources (CPU/RAM/GPU).
Dedicated GPU Server: A physical server or a high-performance cloud instance equipped with powerful GPUs.
- Pros: Provides ultimate performance for local large model inference, with no virtualization overhead.
- Cons: High cost, complex maintenance, and potentially low resource utilization (you pay even when it's idle).
Cloud GPU Instances: Such as AWS EC2 P/G series, Google Cloud N1/A2 series, Azure NC series.
- Pros: Pay-as-you-go, elastic scaling, and deep integration with the cloud ecosystem (storage, networking, databases).
- Cons: Configuration and management have a learning curve; long-term running costs may be higher than a dedicated server.
Kubernetes (K8s) Cluster: The industry standard for container orchestration. It combines a group of physical or virtual machines into a unified resource pool to automatically deploy, scale, and manage containerized applications.
- Pros: High availability, auto-scaling, self-healing, rolling updates, declarative configuration. It is the top choice for building large-scale, resilient microservice applications.
- Cons: Steep learning curve; the cost of setting up and maintaining the cluster itself is relatively high.

2. Containerization & Orchestration

Docker: An open-source application container engine. It allows developers to package an application and all its dependencies (libraries, runtimes, system tools) into a lightweight, portable container.
- Core Value: "Build once, run anywhere." It ensures consistency from development to testing to production, completely solving the classic "it works on my machine" problem.
Kubernetes (K8s): Originating from Google's Borg system, K8s is responsible for automating the lifecycle management of hundreds or thousands of containers.
- Core Components:
  - Pod: The smallest deployable unit in K8s, usually containing one or more tightly coupled containers.
  - Deployment: Defines the desired state of Pods (e.g., number of replicas, image version) and is responsible for maintaining that state.
  - Service: Provides a stable network endpoint (IP address and DNS name) for a set of Pods, enabling service discovery and load balancing.
  - PersistentVolume (PV) and PersistentVolumeClaim (PVC): Used to manage persistent storage, ensuring data is not lost when a Pod restarts or migrates, which is crucial for the Hermes Agent's Memory system.
  - ConfigMap & Secret: Used to decouple configuration and sensitive information (like API Keys) from the application image.

3. High Availability (HA)

The core goal of HA is to eliminate single points of failure, ensuring the system continues to provide service even when some components fail. Common strategies include:

Redundancy: Running multiple instances (replicas) of an application, distributed across different physical nodes or availability zones.
Load Balancing: Distributing traffic across all healthy instances to prevent any single instance from being overloaded.
Health Checks: Periodically checking the health status of application instances and automatically removing or restarting problematic ones.
Data Backup & Recovery: Regularly backing up persistent data (like user profiles, memory database) and having a recovery process in place.

💻 Hands-on Demo

We will proceed step-by-step, starting with the simplest VPS deployment and gradually upgrading to a complex K8s cluster deployment.

Step 1: Basic Deployment - Running on a VPS with Docker and Docker Compose

This is the quickest and most direct way to productionize, suitable for personal projects or small-scale internal use.

Prerequisites:

A VPS with Docker and Docker Compose installed.
The Hermes Agent project code.

1. Create the Dockerfile

In the root directory of your Hermes Agent project, create a Dockerfile to build the application image.

# Dockerfile

# Use an official image containing Python and basic build tools as the base
FROM python:3.11-slim-buster AS builder

# Set the working directory
WORKDIR /app

# Install Poetry (Python dependency management tool)
RUN pip install poetry

# Copy the dependency definition files
COPY poetry.lock pyproject.toml ./

# Install project dependencies. --no-root means do not install the project itself, only dependencies.
# --no-dev means do not install development dependencies, reducing the image size.
RUN poetry config virtualenvs.create false && \
    poetry install --no-dev --no-interaction --no-ansi

# --- Final Image ---
FROM python:3.11-slim-buster

WORKDIR /app

# Copy the installed dependencies from the builder stage
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

# Copy the application code
COPY . .

# Expose the Agent service's port (e.g., 8000)
EXPOSE 8000

# Command to start the application
CMD ["python", "main.py"]

Note: We are using a multi-stage build. The builder stage is responsible for installing all dependencies. The final production image only copies the necessary dependencies and application code from the builder stage. This can significantly reduce the final image size and improve security.

2. Create docker-compose.yml

To manage the Hermes Agent and its potential dependencies (like Redis as a Memory backend), we use Docker Compose.

# docker-compose.yml
version: '3.8'

services:
  hermes-agent:
    build: .
    container_name: hermes_agent_prod
    restart: always
    ports:
      - "8000:8000" # Map port 8000 of the host to port 8000 of the container
    volumes:
      - ./config.prod.yaml:/app/config.yaml # Mount the production configuration file
      - agent_data:/app/data # Mount a data volume for persistence
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY} # Read sensitive information from environment variables
      - MEMORY_BACKEND=redis
      - REDIS_HOST=hermes_redis
    depends_on:
      - hermes_redis

  hermes_redis:
    image: redis:7-alpine
    container_name: hermes_redis_prod
    restart: always
    volumes:
      - redis_data:/data

volumes:
  agent_data:
  redis_data:

Note:
- restart: always ensures the container restarts automatically after an unexpected exit or server reboot.
- volumes are used for data persistence. config.prod.yaml mounts the production config file, and agent_data and redis_data are named volumes for saving the Agent's working data and Redis data.
- environment is used to pass configurations, especially sensitive information like OPENAI_API_KEY. It's recommended to pass these via environment variables or a .env file rather than hardcoding them in the configuration file.

3. Deploy

On your VPS, execute the following commands:

# Create a .env file and add your API Key
echo "OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxx" > .env

# Build and start the services (in detached mode)
docker-compose up --build -d

# Check the running status
docker-compose ps

# View the logs
docker-compose logs -f hermes-agent

At this point, your Hermes Agent is running stably on the VPS. However, it is still a single-point service.

Step 2: Advanced Deployment - Migrating to a Kubernetes (K8s) Cluster

Now, we will deploy the Agent to K8s to gain high availability and scalability.

Prerequisites:

An available K8s cluster (e.g., GKE, EKS, AKS provided by cloud vendors, or a self-hosted cluster).
The kubectl command-line tool is configured and connected to your cluster.
A container image registry (e.g., Docker Hub, GCR, ECR) to store our built Hermes Agent image.

1. Build and Push the Image

# Log in to your image registry
docker login your-registry.io

# Build the image and tag it
docker build -t your-registry.io/hermes-agent:v1.0.0 .

# Push the image
docker push your-registry.io/hermes-agent:v1.0.0

2. Write K8s Manifests (YAML files)

We will create several YAML files to define our deployment.

a. hermes-configmap.yaml - Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: hermes-agent-config
data:
  config.yaml: |
    # Your non-sensitive configuration goes here
    server:
      host: "0.0.0.0"
      port: 8000
    memory:
      backend: "redis"
      redis_host: "hermes-redis-service" # Use the K8s Service name

b. hermes-secret.yaml - Sensitive Information

apiVersion: v1
kind: Secret
metadata:
  name: hermes-agent-secret
type: Opaque
stringData:
  openai-api-key: "sk-xxxxxxxxxxxxxxxxxxxx" # In actual production, you should use a more secure way to manage secrets

c. hermes-redis-deployment.yaml - Redis Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hermes-redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hermes-redis
  template:
    metadata:
      labels:
        app: hermes-redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
  name: hermes-redis-service
spec:
  selector:
    app: hermes-redis
  ports:
  - protocol: TCP
    port: 6379
    targetPort: 6379

d. hermes-agent-deployment.yaml - Core Hermes Agent Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hermes-agent-deployment
spec:
  replicas: 3 # <-- The starting point for high availability: run 3 replicas
  selector:
    matchLabels:
      app: hermes-agent
  template:
    metadata:
      labels:
        app: hermes-agent
    spec:
      containers:
      - name: hermes-agent
        image: your-registry.io/hermes-agent:v1.0.0 # Use your own image
        ports:
        - containerPort: 8000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: hermes-agent-secret
              key: openai-api-key
        volumeMounts:
        - name: config-volume
          mountPath: /app/config.yaml
          subPath: config.yaml
        # --- Health Checks ---
        livenessProbe:
          httpGet:
            path: /health # Assuming your application has a /health endpoint
            port: 8000
          initialDelaySeconds: 15
          periodSeconds: 20
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 10
      volumes:
      - name: config-volume
        configMap:
          name: hermes-agent-config

e. hermes-agent-service.yaml - Exposing the Service

apiVersion: v1
kind: Service
metadata:
  name: hermes-agent-service
spec:
  type: LoadBalancer # <-- The cloud provider will automatically create a load balancer
  selector:
    app: hermes-agent
  ports:
  - protocol: TCP
    port: 80 # The port for external access
    targetPort: 8000 # The port exposed by the container

3. Deploy to the Cluster

# Apply all configurations sequentially
kubectl apply -f hermes-configmap.yaml
kubectl apply -f hermes-secret.yaml
kubectl apply -f hermes-redis-deployment.yaml
kubectl apply -f hermes-agent-deployment.yaml
kubectl apply -f hermes-agent-service.yaml

# Check the deployment status
kubectl get deployments
kubectl get pods -o wide
kubectl get service hermes-agent-service # Get the external IP

Now, your Hermes Agent is running in a high-availability mode on the K8s cluster. K8s will automatically handle Pod scheduling, failure recovery, and traffic load balancing.

Step 3: Enterprise-Level Considerations - Intranet Deployment and High Availability Strategies

1. Enterprise Intranet Deployment

When deploying in an enterprise environment without public internet access, special handling is required:

Network Proxy: If you need to access external APIs (like OpenAI) through an HTTP/HTTPS proxy, you need to set the proxy environment variables in the env section of your Deployment:
```
env:
- name: HTTP_PROXY
  value: "http://your-proxy.com:port"
- name: HTTPS_PROXY
  value: "http://your-proxy.com:port"
```
Private Image Registry: Enterprises often use private registries like Harbor, Nexus, or those provided by cloud vendors. You'll need to create an imagePullSecrets first and then reference it in your Deployment under spec.template.spec so that K8s has permission to pull the image.
Local Models and Dependencies: If the Agent uses a local large model, you need to mount the model files into all Pods using a PersistentVolume. For Python dependencies, you can set up a private PyPI mirror (like devpi) to manage them.

2. Enhancing High Availability (HA) Strategies

Horizontal Pod Autoscaler (HPA): Automatically increase or decrease the number of Pods based on CPU or memory utilization.

# Create an HPA that auto-scales up to 10 pods when CPU utilization exceeds 80%
kubectl autoscale deployment hermes-agent-deployment --cpu-percent=80 --min=3 --max=10

Multi-AZ Deployment: At the K8s cluster level, ensure your worker nodes are distributed across multiple Availability Zones of your cloud provider. This prevents a single data center failure from taking down your entire service.
Data Backup and Disaster Recovery (DR):
- Database Backup: Set up regular snapshot backups for your Redis or other persistent storage.
- Configuration Backup: Keep all your K8s YAML files under Git version control.
- Tools: Use tools like Velero to back up the entire state of your K8s cluster, including PV snapshots, enabling one-click, cluster-level disaster recovery.

Commands Used

Docker & Docker Compose:

docker-compose up --build -d: Build images and start all services in the background.
docker-compose ps: Check the status of currently running services.
docker-compose logs -f <service_name>: Tail the logs of a service in real-time.
docker build -t <tag> .: Build a Docker image.
docker push <tag>: Push an image to a registry.

Kubernetes (kubectl):

kubectl apply -f <filename.yaml>: Apply or update a resource configuration.
kubectl get pods: List the status of all Pods.
kubectl get deployments: List the status of Deployments.
kubectl get services: List the status of Services and their external IPs.
kubectl describe pod <pod_name>: Describe a Pod's details and events for troubleshooting.
kubectl logs -f <pod_name>: Tail the logs of a Pod in real-time.
kubectl scale deployment <name> --replicas=<num>: Manually scale the number of replicas for a Deployment.
kubectl autoscale deployment ...: Create a Horizontal Pod Autoscaler.

Key Takeaways

Deployment is an evolutionary process: Start with a simple VPS, and as business complexity and traffic grow, gradually migrate to Docker Compose, and finally adopt Kubernetes for industrial-grade deployment.
Containerization is the cornerstone of modern deployment: Docker ensures environmental consistency and delivery portability.
Kubernetes provides ultimate elasticity and reliability: Through mechanisms like Deployment, Service, and Probes, K8s achieves application self-healing, auto-scaling, and high availability.
Separate configuration from code: Always use ConfigMaps, Secrets, or environment variables to manage configurations. Never hardcode sensitive information in your code or image.
High availability is a systems engineering challenge: It involves not just running multiple replicas but also health checks, load balancing, multi-zone deployment, and a comprehensive data backup and recovery strategy.
Enterprise intranet deployment has its own unique challenges: It requires a focus on networking, security, and private dependency management.

Transforming the Hermes Agent from a local script into an intelligent agent that provides 24/7 uninterrupted service globally is an exciting engineering challenge. We hope this lesson paves the way for you on this journey from development to production.

References

← PREVIOUS LESSON Lesson 17 | Logging System & Observability Practices

NEXT LESSON → Lesson 19 | Community Ecosystem & Skill Marketplace Contribution Guide