Episode 6: Tokens Deep Dive

⏱ Est. reading time: 6 min Updated on 5/7/2026

Key Takeaway: Tokens are Claude Code's currency. Understanding the three token types is the first step to saving money.


6.1 What is a Token

Tokens are the basic units Claude uses to process text:

Language 1 Token β‰ˆ Example
English 4 characters / 0.75 words "hello" β‰ˆ 1 token
Chinese 1-2 characters "δ½ ε₯½" β‰ˆ 2 tokens
Code Irregular const x = 1; β‰ˆ 5-7 tokens

Each Claude Code conversation involves three token types:

Token Type Direction Opus Price Sonnet Price
Input Token Sent to Claude $3/MTok $0.80/MTok
Output Token Claude's reply $15/MTok $4/MTok
Cache Token Cache hit portion $0.30/MTok $0.08/MTok

Output tokens cost 5Γ— more than input. Saving output tokens matters more.


6.2 Input Tokens are "Cumulative"

Every message you send carries all previous turns. By turn 5, all content from turns 1-4 (conversations + tool results) is included:

Component Per-turn Size Cumulative Effect
System prompt ~3K Fixed
CLAUDE.md ~2K-8K Fixed
Conversation history Grows per turn Turn 1: 500t, Turn 5: possibly 30Kt
Tool results Grows per call One 300-line file Read β‰ˆ +5Kt

6.3 tool_result is the Real Context Killer

assistant text reply:     "Let me read these files..."         ~50 tokens
assistant tool call:      Read("package.json")                  ~30 tokens
tool_result (small file): package.json (38 lines)              ~500 tokens
tool_result (large file): src/index.ts (329 lines)           ~4,000 tokens
tool_result (bash):       npm test (200 lines output)        ~3,000 tokens

Conclusion: AI text replies and tool call instructions are tiny. tool_result (tool execution results) is the real context killer.


6.4 Real-World: "Read Project" Token Changes

═══ Turn 1: User sends "read project" ═══
  Total input:  8,530t   (fixed section + "read project")
  Cache hit:        0t   (no cache on first turn)
  Total output:   650t   (6 Read instructions)

═══ Turn 2: Tool results return + Claude summarizes ═══
  Total input: 49,180t   (+40K file contents)
  Cache hit:    9,180t   (fixed section cached, saving ~90%)
  Total output:   200t   ("Project read complete...")

HUD shows:
Context β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘ 60% (120K/200K)
  Input: 49K β”‚ Output: 1K β”‚ Cache: 9K hit

6.5 Debugging Loop Token Snowball

Round 1: Input: 12K  β†’ Context: 20%
Round 2: Input: 25K  β†’ Context: 35%
Round 3: Input: 50K  β†’ Context: 55%   ← npm install error output is long
Round 4: Input: 80K  β†’ Context: 72%
Round 5: Input: 120K β†’ Context: 90%   ← Context bar turns red!
Round 6: Auto compression triggers β†’ Cache hit rate plummets

6.6 Token-Saving Tips

Save input tokens:

Tip Effect
Use Grep instead of Read full file Save 80-95%
Read with specific line range Save 50-90%
Slim down CLAUDE.md Save 1K-5K per turn
Regular /clear Prevent history accumulation

Save output tokens:

Tip Effect
Use caveman mode Save 40-60% output
"Only change code, don't explain" Skip explanation entirely
Specific instructions: "fix the bug on line 42" 5-10Γ— less output than vague questions

6.7 Viewing Tokens in HUD

Expanded layout (default):

Context β–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘ 33% (67k/200k) (in: 318, cache: 66k)
tokens 1.3M (in: 218k, out: 5k, cache: 1.0M)
HUD Display Meaning
in: 318 Input tokens for this API call (not cumulative)
cache: 66k Cache hit tokens
Session tokens line Cumulative input/output/cache for entire session

Note: The identity line's input/cache are per-request values, not session totals.


Next Episode: Episode 7 covers usage limits β€” how to read the Pro/Max/Team 5-hour and 7-day windows in HUD.