Episode 17: Q&A — Cache Mechanisms & Token Optimization — 📊 Claude HUD Complete Tutorial: From Beginner to Master in 18 Chapters

This Episode: Cache is the core of saving money. These 7 questions help you fully understand the caching mechanism.

Q8: Is Prompt Cache the same as browser cache?

No. Browser cache stores web resources (images, JS, CSS). Prompt Cache stores API request prefix matching results.

Principle: If two consecutive API requests have identical opening sections, the repeated parts are read from cache. Saves ~90% input cost.

Key limitation: Cache requires an identical continuous prefix from the start. Any change in the middle invalidates everything from that point onward.

Q9: When does the 5-minute cache countdown start?

From the most recent time the LLM returned a response — i.e., lastAssistantResponseAt.

Not when you sent the message — when AI finished its reply. Send another message within 5 minutes → cache hit → save 90% input cost.

Q10: Is "assistant response" the same as LLM response?

Yes. "assistant" is the role name in the API message format:

[
  { "role": "user", "content": "hello" },          ← Your message
  { "role": "assistant", "content": "Hello!" }     ← LLM's reply
]

Q11: How much do cache hits save?

Using Opus model, with 100K input tokens and 80K cache hit:

	Calculation	Cost
No cache	100K × $3/MTok	$0.30
80K cache hit	80K × $0.30/MTok + 20K × $3/MTok	$0.084

Saves 72%. Over a long session (20 turns), good cache vs bad cache means 3-5× cost difference.

Q12: Does cache survive `/clear`?

No. /clear empties all conversation history. Cache prefix completely changes. No old cache can be hit.

First turn is most expensive: fixed section fully reprocessed. Cache recovers from turn 2 onward.

Q13: How to keep cache from expiring?

Simplest method: send any message before cache expires. Any message works. After LLM responds, the 5-minute TTL resets.

Cache TTL: 4m 12s    ← No rush
Cache TTL: 0m 30s    ← About to expire! Send a message to extend
Cache TTL: -expired  ← Expired, next request reprocesses everything

Q14: Why does cache hit rate suddenly drop?

Three reasons (by probability):

Auto compression triggered (context > 95%) — prefix changes, cache invalidates from compression point
/compact executed — same effect
TTL expired — 5 minutes without new request

First two are "structural invalidation" — unavoidable. The third can be proactively managed (see Q13).

Next Episode: Episode 18 — the final Q&A covering HUD debugging, Memory, and advanced topics.

Episode 17: Q&A — Cache Mechanisms & Token Optimization