Issue 10 | Performance Benchmarks and Best Practices β Integrating Caveman into Your Development Workflow
π― Learning Objectives
After completing this issue, you will master:
- How to run Caveman's official Benchmark and Eval suites
- The Three-Arm Evaluation methodology: Why Caveman is better than "please answer briefly"
- A complete daily development workflow: The full chain from startup to commit
- Optimal mode selection strategies for different scenarios
π Core Content
10.1 Official Benchmark Data
Caveman's token compression effect isn't self-proclaimedβit's backed by real Claude API token count data.
| Test Prompt | Normal Tokens | Caveman Tokens | Compression Rate |
|---|---|---|---|
| React re-render explanation | 69 | 19 | 72% |
| Auth middleware Bug | 89 | 23 | 74% |
| TypeScript generics tutorial | 156 | 42 | 73% |
| Express performance optimization advice | 203 | 51 | 75% |
| Docker deployment troubleshooting | 178 | 38 | 79% |
| Database index optimization | 145 | 33 | 77% |
| CSS Grid layout guide | 112 | 28 | 75% |
| Git branching strategy advice | 98 | 24 | 76% |
Statistical Summary:
- Range: 22% β 87%
- Average: ~71-75%
- Median: ~75%
π‘ Important: Caveman only affects output tokens. Thinking/reasoning tokens are completely unaffected. Caveman doesn't shrink the brain, it just shrinks the mouth.
10.2 Running Official Benchmarks
You can reproduce this data yourself:
# Clone the repository
git clone https://github.com/JuliusBrussee/caveman.git
cd caveman
# Run LLM evaluation (requires Claude CLI and a valid API Key)
uv run python evals/llm_run.py
# Analyze results offline (no API Key needed)
uv run --with tiktoken python evals/measure.py
Three-Arm Evaluation Design (Three-Arm Eval)
Caveman's Eval doesn't simply compare "Normal vs Caveman"βthat would conflate Caveman's effect with "generic brief instructions."
graph TD
A["Three-Arm Evaluation Design"]
A --> B["Arm 1: Verbose
(No Constraints)
Claude Normal Response"]
A --> C["Arm 2: Terse
(Only 'be brief')
General Brief Instruction"]
A --> D["Arm 3: Caveman
(Full Skill Rules)
Structured Compression"]
B --> E["Baseline Comparison"]
C --> F["Proves Caveman β Simply 'be brief'"]
D --> G["Actual Compression Effect"]
F -.->|"Comparison"| GWhy a Three-Arm Design?
If you only compare Verbose vs Caveman, you cannot distinguish whether the compression effect comes from:
- Caveman's structured rules (
[thing] [action] [reason]pattern) - Or simply because you told the Agent "please answer briefly"
In the three-arm design, Arm 2 (Terse) is the control groupβit only says "be brief." If Caveman saves more tokens than Terse and maintains higher accuracy, it proves that Caveman's rule design itself has value, and is not just "asking for brevity."
Actual results: Caveman saves an additional 15-25% tokens compared to Terse mode, with higher technical accuracy.
10.3 Academic Background: Brevity β Coarseness
A March 2026 paper, "Brevity Constraints Reverse Performance Hierarchies in Language Models", found:
graph LR
A["Traditional Assumption
More Tokens = Better Answer"] -->|"β Paper Disproved"| B["Experimental Results
Brevity Constraint Improves Accuracy by 26%"]
C["Large Models (Verbose)"] -->|"Add Brevity Constraint"| D["Accuracy Improvement"]
E["Small Models (Concise)"] -->|"No Constraint"| F["Accuracy Even Higher"]
D --> G["Conclusion: Verbosity is Noise
Not Signal"]
F --> GKey Findings:
- Brevity constraints improve accuracy by 26 percentage points (on specific benchmarks)
- Reverses model rankings: Smaller models that originally performed worse surpassed larger models under brevity constraints
- Verbosity is noise: The computational power models spend on rhetoric could be used for reasoning
This academically validates Caveman's core hypothesis: Remove the fluff, and reasoning becomes more accurate.
10.4 Complete Caveman Daily Workflow
graph TD
A["π Start Agent Session"] --> B["Hook Automatically Activates Caveman
[CAVEMAN] Badge Lights Up"]
B --> C{"Development Phase"}
C -->|"π¨ Coding"| D["πͺ¨ /caveman full
Concise Technical Answers
Troubleshooting, Writing Code"]
C -->|"π Debugging"| E["π₯ /caveman ultra
Rapid Troubleshooting
Minimal Text to Core Issue"]
C -->|"π Learning"| F["πͺΆ /caveman lite
Retains Full Sentences
Easier Concept Understanding"]
C -->|"π¨π³ Chinese Projects"| G["π /caveman wenyan
Classical Chinese Mode
Most Token-Efficient for Chinese"]
D --> H["β
Code Modification Complete"]
E --> H
F --> H
G --> H
H --> I["π /caveman-review
One-Line Code Review
L42: π΄ bug: ..."]
I --> J{"Review Passed?"}
J -->|"β Issues Found"| K["Fix Issues"]
K --> I
J -->|"β
Passed"| L["π /caveman-commit
Refined Commit Message
fix(auth): token <= not <"]
L --> M["π¦ git push"]
M --> N["ποΈ /caveman:compress
Compress CLAUDE.md
Saves Tokens for Next Session"]
N --> O["π Done!"]
style B fill:#FFD700
style I fill:#87CEEB
style L fill:#90EE90
style N fill:#DDA0DD10.5 Scenario Γ Mode Selection Matrix
| Work Scenario | Recommended Mode | Reason |
|---|---|---|
| Daily Coding | full |
Balances readability and compression rate |
| Rapid Debugging | ultra |
Minimal text to pinpoint root cause |
| Learning New Tech | lite |
Requires more explanatory context |
| Code Review | /caveman-review |
Dedicated review format |
| Git Commit | /caveman-commit |
Dedicated commit format |
| Writing Documentation | Normal mode | Documentation requires full expression |
| Chinese Projects | wenyan |
More token-efficient for Chinese |
| Pair Programming | lite |
Colleagues also need to understand |
| CI/CD Review | ultra + review |
Machine consumption, shorter is better |
| Context Compression | /caveman:compress |
Compresses CLAUDE.md |
10.6 Full Workflow Comparison Across Platforms
| Workflow Step | Claude Code | Antigravity | Gemini CLI | Codex | OpenCode |
|---|---|---|---|---|---|
| 1. Session Start | Hook auto-activates | GEMINI.md rules | Extension auto | hooks.json | AGENTS.md |
| 2. Mode Switching | /caveman ultra |
Natural language | /caveman ultra |
$caveman ultra |
Natural language |
| 3. Coding Interaction | β Full Tool Calling | β Full Tool Calling | β Full Tool Calling | β Full Tool Calling | β Full Tool Calling |
| 4. Code Review | /caveman-review |
Natural language | /caveman-review |
$caveman-review |
Natural language |
| 5. Committing Code | /caveman-commit |
Natural language | /caveman-commit |
$caveman-commit |
Natural language |
| 6. Context Compression | /caveman:compress |
Natural language | /caveman:compress |
$caveman-compress |
Natural language |
| 7. Status Monitoring | β
[CAVEMAN:MODE] |
β | β | β | β |
| 8. Exiting Caveman | "stop caveman" | "stop caveman" | "stop caveman" | "stop caveman" | "stop caveman" |
10.7 Advanced Best Practices
Practice 1: CLAUDE.md Layered Strategy
~/.claude/CLAUDE.md β Global Caveman always-on (applies to all projects)
<project>/CLAUDE.md β Project-specific rules (already compressed)
<project>/CLAUDE.original.md β Human-readable original (edit this)
Practice 2: Team-wide Unified Configuration
# Commit Caveman configuration in the project root
echo 'Terse like caveman. Technical substance exact...' >> CLAUDE.md
echo 'Terse like caveman. Technical substance exact...' >> GEMINI.md
# Ensure all team members use the same Caveman behavior
git add CLAUDE.md GEMINI.md
git commit -m "chore: add caveman always-on for team"
Practice 3: CI/CD Integration
# .github/workflows/pr-review.yml
- name: Caveman Code Review
run: |
# Use Claude Code Action + caveman-review rules
# Each PR automatically gets a one-line review
Practice 4: Combining with cavemem
# Install cavemem (memory compression)
# Combine with caveman (output compression) for dual optimization
npm install -g cavemem
# caveman compresses output β saves output tokens
# cavemem compresses memory β saves input tokens
# Combined β total token consumption reduced by 60%+
Practice 5: Customizing Caveman Rules
If you need domain-specific Caveman rules, you can create custom Skills:
<!-- .claude/skills/my-caveman/SKILL.md -->
## My Custom Caveman Rules
Base: Terse like caveman. Technical substance exact.
Additional rules for this project:
- Always mention file paths in full
- Include line numbers when discussing bugs
- Use Chinese for variable name explanations
- Keep API endpoint paths in backticks
π Return on Investment Summary
graph LR
subgraph Investment["π° Investment"]
A1["Installation: 1 min"]
A2["Configuration: 5 min"]
A3["Learning: This 10-part tutorial"]
end
subgraph Return["π Return"]
B1["Output Tokens: -75%"]
B2["Input Tokens: -46%"]
B3["Response Speed: +3x"]
B4["Monthly Cost: -$46"]
B5["Readability: β"]
end
Investment --> Return| Metric | Without Caveman | With Caveman | Improvement |
|---|---|---|---|
| Avg. Tokens per Response | ~300 | ~80 | -73% |
| Input Tokens per Session | ~2800 | ~1500 | -46% |
| Daily Token Consumption | ~68,000 | ~19,200 | -72% |
| Monthly Cost (Est.) | ~$63 | ~$17 | -$46/month |
| Response Reading Time | ~15 sec | ~5 sec | -66% |
| Technical Accuracy | 100% | 100% | Unchanged |
π Full Series Review
| Issue | Topic | Key Takeaways |
|---|---|---|
| 01 | What is Caveman | Token Compression Philosophy + Ecosystem Overview |
| 02 | Installation on Three Platforms | Claude Code / Antigravity / Gemini CLI Installation Comparison |
| 03 | In-depth Hooks Analysis | Auto-activation Engine + Flag File Mechanism |
| 04 | Four-Speed Modes | Lite / Full / Ultra / Classical Chinese + Switching Methods |
| 05 | /caveman Core Skill | Daily Development Practice + Response Modes |
| 06 | /caveman-commit | Refined Git Commits + Git Hook Integration |
| 07 | /caveman-review | One-Line Code Review + GitHub Actions |
| 08 | /caveman:compress | Compress CLAUDE.md + Input Token Optimization |
| 09 | Always-On Configuration | Five Platform Rule Files + Team Sharing |
| 10 | Benchmarks + Best Practices | Complete Workflow + Return on Investment |
π Graduation Tasks
Complete the following tasks to become a qualified Caveman user:
- Install Caveman on your primary Agent
- Complete a full feature development using
fullmode - Review your own code using
/caveman-review - Generate a commit message using
/caveman-commit - Compress your CLAUDE.md using
/caveman:compress - Configure Always-On to ensure automatic activation in the next session
- (Bonus) Commit the configuration to Git so your team can also use Caveman