Issue 01 | What is Caveman β€” The Philosophy of Token Compression and the Ecosystem Landscape

⏱ Est. reading time: 11 min Updated on 5/7/2026

🎯 Learning Objectives

By the end of this issue, you will understand:

  1. Why AI Agent output token redundancy is a serious engineering problem
  2. Caveman's core mechanism: Prompt rule injection, not an MCP service
  3. The positioning and collaborative relationship of the Caveman ecosystem's three-piece set
  4. The scientific basis for saving tokens – empirical evidence from academic papers

πŸ“– Core Concepts Explained

1.1 Redundant Output: The Silent Killer of AI Agents

When you use an AI coding agent, you might not have noticed a fact: 75% of what the Agent says is superfluous.

πŸ—£οΈ Normal Claude (69 tokens):
"The reason your React component is re-rendering is likely because
you're creating a new object reference on each render cycle. When you
pass an inline object as a prop, React's shallow comparison sees it
as a different object every time, which triggers a re-render. I'd
recommend using useMemo to memoize the object."

πŸͺ¨ Caveman Claude (19 tokens):
"New object ref each render. Inline object prop = new ref = re-render.
Wrap in useMemo."

The same technical conclusion, compressed from 69 tokens to 19 tokens – a 72% saving.

This isn't cutting corners. What's removed is only:

  • ❌ Pleasantries ("The reason your...", "I'd recommend...")
  • ❌ Filler words ("likely", "actually", "basically")
  • ❌ Redundant explanations (the same concept explained three ways)

What's retained is:

  • βœ… Root cause diagnosis (new object ref)
  • βœ… Causal chain (inline prop β†’ new ref β†’ re-render)
  • βœ… Solution (useMemo)

1.2 The True Arithmetic of Token Costs

Taking Claude Sonnet 4 as an example (2026 pricing):

Metric Normal Mode Caveman Mode Savings
Avg. tokens per response ~300 ~80 73%
Interactions per hour ~40 ~40 β€”
Tokens consumed per hour 12,000 3,200 73%
8-hour daily cost (est.) ~$2.88 ~$0.77 $2.11/day
22-day monthly cost ~$63 ~$17 $46/month

πŸ’‘ Key Insight: Caveman only compresses output tokens (the Agent's responses). Thinking/reasoning tokens are completely unaffected. Caveman doesn't shrink the brain, it just shrinks the mouth.

1.3 The Essence of Caveman: A Behavioral Rule Injector

Many people, upon hearing "install Caveman," assume it requires configuring an MCP server. It does not.

graph TB
    subgraph Wrong["❌ Common Misconception"]
        A1["Caveman = MCP Server?"] --> A2["Requires starting a background service?"]
        A2 --> A3["Communicates via JSON-RPC?"]
    end
    
    subgraph Right["βœ… Actual Mechanism"]
        B1["Caveman = Prompt Skill"] --> B2["SessionStart Hook injects rules"]
        B2 --> B3["Agent compresses output according to rules"]
        B3 --> B4["Flag File tracks mode status"]
    end
    
    style Wrong fill:#ff000020,stroke:#ff0000
    style Right fill:#00ff0020,stroke:#00ff00

Caveman works in a very simple way:

  1. When installed, a set of Prompt rules and Hook scripts are registered in your Agent environment
  2. When a session starts, the Hook automatically injects the rules into the Agent's system context
  3. When the Agent responds, it compresses the output according to the injected rules
  4. That's it. No background service, no network requests, no extra dependencies

1.4 Caveman Ecosystem Overview

Caveman is not an isolated tool; it's a three-piece ecosystem:

graph LR
    subgraph Ecosystem["πŸͺ¨ Caveman Ecosystem"]
        direction TB
        A["πŸͺ¨ caveman
───────────
Compresses Agent output
Saves ~75% output tokens"] B["🧠 cavemem
───────────
Compresses Agent memory
Saves ~46% input tokens"] C["πŸ”§ cavekit
───────────
Build toolchain optimization
Boosts development efficiency"] end A -->|"Shorter output"| D["πŸ’° Reduced Costs"] B -->|"More concise context"| D C -->|"Faster workflow"| D D --> E["πŸš€ 3x overall Agent efficiency boost"]
Tool Object of Compression Compression Rate One-sentence Positioning
caveman Agent's output (response) ~75% Makes the Agent speak shorter
cavemem Agent's input (e.g., CLAUDE.md) ~46% Makes the Agent read faster
cavekit Build workflow β€” Makes development smoother

The three tools can be used independently or in combination. This tutorial focuses on caveman itself.

1.5 Scientific Background: Brevity β‰  Crudeness

You might worry: with so much compression, will information be lost?

A March 2026 paper, "Brevity Constraints Reverse Performance Hierarchies in Language Models", revealed a surprising finding:

On certain benchmarks, forcing models to respond briefly actually improved accuracy by 26 percentage points and completely reversed the performance rankings among different models.

This means verbosity not only wastes tokens but can also reduce accuracy. Caveman's compression essentially guides the model to focus on core reasoning, rather than expending computational power on rhetoric.


πŸ’» Intuitive Feel: Before vs After

Scenario: Auth Middleware Bug Troubleshooting

πŸ—£οΈ Normal Claude:

Sure! I'd be happy to help you with that. The issue you're experiencing
is most likely caused by your authentication middleware not properly
validating the token expiry. Let me take a look and suggest a fix.

The problem is in your `verifyToken` function. You're using a strict
less-than comparison (`<`) when checking the token's expiration
timestamp against the current time. This means that if the token
expires at exactly the current second, it won't be caught. You should
use a less-than-or-equal comparison (`<=`) instead.

Here's the fix:

πŸͺ¨ Caveman Claude:

Bug in auth middleware. Token expiry check use < not <=. Fix:

The same diagnostic conclusion. 1 line replaces 7 lines.


πŸ”§ Quick Concept Check

Concept Description
Output Token Tokens consumed by Agent's response (Caveman's compression target)
Input Token Tokens consumed by Agent reading context (cavemem's compression target)
Thinking Token Tokens consumed by Agent's internal reasoning (Caveman does not affect)
Skill Agent's extensible instruction set, essentially a Markdown file
Hook Deterministic callback script in the Agent's lifecycle
Flag File ~/.claude/.caveman-active, tracks current mode

πŸ“ Key Takeaways from this Issue

  1. Approximately 75% of AI Agent output consists of redundant rhetoric; Caveman precisely removes this "superfluous talk"
  2. Caveman is not an MCP server, but a Skill that injects Prompt rules via Hooks
  3. The ecosystem's three-piece set: caveman (compresses output) + cavemem (compresses memory) + cavekit (optimizes builds)
  4. Academic papers confirm: brevity constraints can actually improve model accuracy
  5. Token savings directly translate into cost savings and improved response speed

πŸ”— References