Issue 01 | What is Caveman — The Philosophy of Token Compression and the Ecosystem Landscape — 🪨 Caveman Complete Guide: The Ultimate Weapon for AI Agents to Save 75% Tokens

🎯 Learning Objectives

By the end of this issue, you will understand:

Why AI Agent output token redundancy is a serious engineering problem
Caveman's core mechanism: Prompt rule injection, not an MCP service
The positioning and collaborative relationship of the Caveman ecosystem's three-piece set
The scientific basis for saving tokens – empirical evidence from academic papers

📖 Core Concepts Explained

1.1 Redundant Output: The Silent Killer of AI Agents

When you use an AI coding agent, you might not have noticed a fact: 75% of what the Agent says is superfluous.

🗣️ Normal Claude (69 tokens):
"The reason your React component is re-rendering is likely because
you're creating a new object reference on each render cycle. When you
pass an inline object as a prop, React's shallow comparison sees it
as a different object every time, which triggers a re-render. I'd
recommend using useMemo to memoize the object."

🪨 Caveman Claude (19 tokens):
"New object ref each render. Inline object prop = new ref = re-render.
Wrap in useMemo."

The same technical conclusion, compressed from 69 tokens to 19 tokens – a 72% saving.

This isn't cutting corners. What's removed is only:

❌ Pleasantries ("The reason your...", "I'd recommend...")
❌ Filler words ("likely", "actually", "basically")
❌ Redundant explanations (the same concept explained three ways)

What's retained is:

✅ Root cause diagnosis (new object ref)
✅ Causal chain (inline prop → new ref → re-render)
✅ Solution (useMemo)

1.2 The True Arithmetic of Token Costs

Taking Claude Sonnet 4 as an example (2026 pricing):

Metric	Normal Mode	Caveman Mode	Savings
Avg. tokens per response	~300	~80	73%
Interactions per hour	~40	~40	—
Tokens consumed per hour	12,000	3,200	73%
8-hour daily cost (est.)	~$2.88	~$0.77	$2.11/day
22-day monthly cost	~$63	~$17	$46/month

💡 Key Insight: Caveman only compresses output tokens (the Agent's responses). Thinking/reasoning tokens are completely unaffected. Caveman doesn't shrink the brain, it just shrinks the mouth.

1.3 The Essence of Caveman: A Behavioral Rule Injector

Many people, upon hearing "install Caveman," assume it requires configuring an MCP server. It does not.

graph TB
    subgraph Wrong["❌ Common Misconception"]
        A1["Caveman = MCP Server?"] --> A2["Requires starting a background service?"]
        A2 --> A3["Communicates via JSON-RPC?"]
    end
    
    subgraph Right["✅ Actual Mechanism"]
        B1["Caveman = Prompt Skill"] --> B2["SessionStart Hook injects rules"]
        B2 --> B3["Agent compresses output according to rules"]
        B3 --> B4["Flag File tracks mode status"]
    end
    
    style Wrong fill:#ff000020,stroke:#ff0000
    style Right fill:#00ff0020,stroke:#00ff00

Caveman works in a very simple way:

When installed, a set of Prompt rules and Hook scripts are registered in your Agent environment
When a session starts, the Hook automatically injects the rules into the Agent's system context
When the Agent responds, it compresses the output according to the injected rules
That's it. No background service, no network requests, no extra dependencies

1.4 Caveman Ecosystem Overview

Caveman is not an isolated tool; it's a three-piece ecosystem:

graph LR
    subgraph Ecosystem["🪨 Caveman Ecosystem"]
        direction TB
        A["🪨 caveman
───────────
Compresses Agent output
Saves ~75% output tokens"]
        B["🧠 cavemem
───────────
Compresses Agent memory
Saves ~46% input tokens"]
        C["🔧 cavekit
───────────
Build toolchain optimization
Boosts development efficiency"]
    end
    
    A -->|"Shorter output"| D["💰 Reduced Costs"]
    B -->|"More concise context"| D
    C -->|"Faster workflow"| D
    D --> E["🚀 3x overall Agent efficiency boost"]

Tool	Object of Compression	Compression Rate	One-sentence Positioning
caveman	Agent's output (response)	~75%	Makes the Agent speak shorter
cavemem	Agent's input (e.g., CLAUDE.md)	~46%	Makes the Agent read faster
cavekit	Build workflow	—	Makes development smoother

The three tools can be used independently or in combination. This tutorial focuses on caveman itself.

1.5 Scientific Background: Brevity ≠ Crudeness

You might worry: with so much compression, will information be lost?

A March 2026 paper, "Brevity Constraints Reverse Performance Hierarchies in Language Models", revealed a surprising finding:

On certain benchmarks, forcing models to respond briefly actually improved accuracy by 26 percentage points and completely reversed the performance rankings among different models.

This means verbosity not only wastes tokens but can also reduce accuracy. Caveman's compression essentially guides the model to focus on core reasoning, rather than expending computational power on rhetoric.

💻 Intuitive Feel: Before vs After

Scenario: Auth Middleware Bug Troubleshooting

🗣️ Normal Claude:

Sure! I'd be happy to help you with that. The issue you're experiencing
is most likely caused by your authentication middleware not properly
validating the token expiry. Let me take a look and suggest a fix.

The problem is in your `verifyToken` function. You're using a strict
less-than comparison (`<`) when checking the token's expiration
timestamp against the current time. This means that if the token
expires at exactly the current second, it won't be caught. You should
use a less-than-or-equal comparison (`<=`) instead.

Here's the fix:

🪨 Caveman Claude:

Bug in auth middleware. Token expiry check use < not <=. Fix:

The same diagnostic conclusion. 1 line replaces 7 lines.

🔧 Quick Concept Check

Concept	Description
Output Token	Tokens consumed by Agent's response (Caveman's compression target)
Input Token	Tokens consumed by Agent reading context (cavemem's compression target)
Thinking Token	Tokens consumed by Agent's internal reasoning (Caveman does not affect)
Skill	Agent's extensible instruction set, essentially a Markdown file
Hook	Deterministic callback script in the Agent's lifecycle
Flag File	`~/.claude/.caveman-active`, tracks current mode

📝 Key Takeaways from this Issue

Approximately 75% of AI Agent output consists of redundant rhetoric; Caveman precisely removes this "superfluous talk"
Caveman is not an MCP server, but a Skill that injects Prompt rules via Hooks
The ecosystem's three-piece set: caveman (compresses output) + cavemem (compresses memory) + cavekit (optimizes builds)
Academic papers confirm: brevity constraints can actually improve model accuracy
Token savings directly translate into cost savings and improved response speed

Issue 01 | What is Caveman — The Philosophy of Token Compression and the Ecosystem Landscape