Episode 6: Tokens Deep Dive — 📊 Claude HUD Complete Tutorial: From Beginner to Master in 18 Chapters

Key Takeaway: Tokens are Claude Code's currency. Understanding the three token types is the first step to saving money.

6.1 What is a Token

Tokens are the basic units Claude uses to process text:

Language	1 Token ≈	Example
English	4 characters / 0.75 words	"hello" ≈ 1 token
Chinese	1-2 characters	"你好" ≈ 2 tokens
Code	Irregular	`const x = 1;` ≈ 5-7 tokens

Each Claude Code conversation involves three token types:

Token Type	Direction	Opus Price	Sonnet Price
Input Token	Sent to Claude	$3/MTok	$0.80/MTok
Output Token	Claude's reply	$15/MTok	$4/MTok
Cache Token	Cache hit portion	$0.30/MTok	$0.08/MTok

Output tokens cost 5× more than input. Saving output tokens matters more.

6.2 Input Tokens are "Cumulative"

Every message you send carries all previous turns. By turn 5, all content from turns 1-4 (conversations + tool results) is included:

Component	Per-turn Size	Cumulative Effect
System prompt	~3K	Fixed
CLAUDE.md	~2K-8K	Fixed
Conversation history	Grows per turn	Turn 1: 500t, Turn 5: possibly 30Kt
Tool results	Grows per call	One 300-line file Read ≈ +5Kt

6.3 tool_result is the Real Context Killer

assistant text reply:     "Let me read these files..."         ~50 tokens
assistant tool call:      Read("package.json")                  ~30 tokens
tool_result (small file): package.json (38 lines)              ~500 tokens
tool_result (large file): src/index.ts (329 lines)           ~4,000 tokens
tool_result (bash):       npm test (200 lines output)        ~3,000 tokens

Conclusion: AI text replies and tool call instructions are tiny. tool_result (tool execution results) is the real context killer.

6.4 Real-World: "Read Project" Token Changes

═══ Turn 1: User sends "read project" ═══
  Total input:  8,530t   (fixed section + "read project")
  Cache hit:        0t   (no cache on first turn)
  Total output:   650t   (6 Read instructions)

═══ Turn 2: Tool results return + Claude summarizes ═══
  Total input: 49,180t   (+40K file contents)
  Cache hit:    9,180t   (fixed section cached, saving ~90%)
  Total output:   200t   ("Project read complete...")

HUD shows:
Context ██████░░░░ 60% (120K/200K)
  Input: 49K │ Output: 1K │ Cache: 9K hit

6.5 Debugging Loop Token Snowball

Round 1: Input: 12K  → Context: 20%
Round 2: Input: 25K  → Context: 35%
Round 3: Input: 50K  → Context: 55%   ← npm install error output is long
Round 4: Input: 80K  → Context: 72%
Round 5: Input: 120K → Context: 90%   ← Context bar turns red!
Round 6: Auto compression triggers → Cache hit rate plummets

6.6 Token-Saving Tips

Save input tokens:

Tip	Effect
Use `Grep` instead of `Read` full file	Save 80-95%
`Read` with specific line range	Save 50-90%
Slim down CLAUDE.md	Save 1K-5K per turn
Regular `/clear`	Prevent history accumulation

Save output tokens:

Tip	Effect
Use caveman mode	Save 40-60% output
"Only change code, don't explain"	Skip explanation entirely
Specific instructions: "fix the bug on line 42"	5-10× less output than vague questions

6.7 Viewing Tokens in HUD

Expanded layout (default):

Context ███░░░░░░░ 33% (67k/200k) (in: 318, cache: 66k)
tokens 1.3M (in: 218k, out: 5k, cache: 1.0M)

HUD Display	Meaning
`in: 318`	Input tokens for this API call (not cumulative)
`cache: 66k`	Cache hit tokens
Session tokens line	Cumulative input/output/cache for entire session

Note: The identity line's input/cache are per-request values, not session totals.

Next Episode: Episode 7 covers usage limits — how to read the Pro/Max/Team 5-hour and 7-day windows in HUD.