Episode 6: Tokens Deep Dive
Key Takeaway: Tokens are Claude Code's currency. Understanding the three token types is the first step to saving money.
6.1 What is a Token
Tokens are the basic units Claude uses to process text:
| Language | 1 Token β | Example |
|---|---|---|
| English | 4 characters / 0.75 words | "hello" β 1 token |
| Chinese | 1-2 characters | "δ½ ε₯½" β 2 tokens |
| Code | Irregular | const x = 1; β 5-7 tokens |
Each Claude Code conversation involves three token types:
| Token Type | Direction | Opus Price | Sonnet Price |
|---|---|---|---|
| Input Token | Sent to Claude | $3/MTok | $0.80/MTok |
| Output Token | Claude's reply | $15/MTok | $4/MTok |
| Cache Token | Cache hit portion | $0.30/MTok | $0.08/MTok |
Output tokens cost 5Γ more than input. Saving output tokens matters more.
6.2 Input Tokens are "Cumulative"
Every message you send carries all previous turns. By turn 5, all content from turns 1-4 (conversations + tool results) is included:
| Component | Per-turn Size | Cumulative Effect |
|---|---|---|
| System prompt | ~3K | Fixed |
| CLAUDE.md | ~2K-8K | Fixed |
| Conversation history | Grows per turn | Turn 1: 500t, Turn 5: possibly 30Kt |
| Tool results | Grows per call | One 300-line file Read β +5Kt |
6.3 tool_result is the Real Context Killer
assistant text reply: "Let me read these files..." ~50 tokens
assistant tool call: Read("package.json") ~30 tokens
tool_result (small file): package.json (38 lines) ~500 tokens
tool_result (large file): src/index.ts (329 lines) ~4,000 tokens
tool_result (bash): npm test (200 lines output) ~3,000 tokens
Conclusion: AI text replies and tool call instructions are tiny. tool_result (tool execution results) is the real context killer.
6.4 Real-World: "Read Project" Token Changes
βββ Turn 1: User sends "read project" βββ
Total input: 8,530t (fixed section + "read project")
Cache hit: 0t (no cache on first turn)
Total output: 650t (6 Read instructions)
βββ Turn 2: Tool results return + Claude summarizes βββ
Total input: 49,180t (+40K file contents)
Cache hit: 9,180t (fixed section cached, saving ~90%)
Total output: 200t ("Project read complete...")
HUD shows:
Context ββββββββββ 60% (120K/200K)
Input: 49K β Output: 1K β Cache: 9K hit
6.5 Debugging Loop Token Snowball
Round 1: Input: 12K β Context: 20%
Round 2: Input: 25K β Context: 35%
Round 3: Input: 50K β Context: 55% β npm install error output is long
Round 4: Input: 80K β Context: 72%
Round 5: Input: 120K β Context: 90% β Context bar turns red!
Round 6: Auto compression triggers β Cache hit rate plummets
6.6 Token-Saving Tips
Save input tokens:
| Tip | Effect |
|---|---|
Use Grep instead of Read full file |
Save 80-95% |
Read with specific line range |
Save 50-90% |
| Slim down CLAUDE.md | Save 1K-5K per turn |
Regular /clear |
Prevent history accumulation |
Save output tokens:
| Tip | Effect |
|---|---|
| Use caveman mode | Save 40-60% output |
| "Only change code, don't explain" | Skip explanation entirely |
| Specific instructions: "fix the bug on line 42" | 5-10Γ less output than vague questions |
6.7 Viewing Tokens in HUD
Expanded layout (default):
Context ββββββββββ 33% (67k/200k) (in: 318, cache: 66k)
tokens 1.3M (in: 218k, out: 5k, cache: 1.0M)
| HUD Display | Meaning |
|---|---|
in: 318 |
Input tokens for this API call (not cumulative) |
cache: 66k |
Cache hit tokens |
| Session tokens line | Cumulative input/output/cache for entire session |
Note: The identity line's input/cache are per-request values, not session totals.
Next Episode: Episode 7 covers usage limits β how to read the Pro/Max/Team 5-hour and 7-day windows in HUD.