Episode 17: Q&A β€” Cache Mechanisms & Token Optimization

⏱ Est. reading time: 4 min Updated on 5/7/2026

This Episode: Cache is the core of saving money. These 7 questions help you fully understand the caching mechanism.


Q8: Is Prompt Cache the same as browser cache?

No. Browser cache stores web resources (images, JS, CSS). Prompt Cache stores API request prefix matching results.

Principle: If two consecutive API requests have identical opening sections, the repeated parts are read from cache. Saves ~90% input cost.

Key limitation: Cache requires an identical continuous prefix from the start. Any change in the middle invalidates everything from that point onward.


Q9: When does the 5-minute cache countdown start?

From the most recent time the LLM returned a response β€” i.e., lastAssistantResponseAt.

Not when you sent the message β€” when AI finished its reply. Send another message within 5 minutes β†’ cache hit β†’ save 90% input cost.


Q10: Is "assistant response" the same as LLM response?

Yes. "assistant" is the role name in the API message format:

[
  { "role": "user", "content": "hello" },          ← Your message
  { "role": "assistant", "content": "Hello!" }     ← LLM's reply
]

Q11: How much do cache hits save?

Using Opus model, with 100K input tokens and 80K cache hit:

Calculation Cost
No cache 100K Γ— $3/MTok $0.30
80K cache hit 80K Γ— $0.30/MTok + 20K Γ— $3/MTok $0.084

Saves 72%. Over a long session (20 turns), good cache vs bad cache means 3-5Γ— cost difference.


Q12: Does cache survive /clear?

No. /clear empties all conversation history. Cache prefix completely changes. No old cache can be hit.

First turn is most expensive: fixed section fully reprocessed. Cache recovers from turn 2 onward.


Q13: How to keep cache from expiring?

Simplest method: send any message before cache expires. Any message works. After LLM responds, the 5-minute TTL resets.

Cache TTL: 4m 12s    ← No rush
Cache TTL: 0m 30s    ← About to expire! Send a message to extend
Cache TTL: -expired  ← Expired, next request reprocesses everything

Q14: Why does cache hit rate suddenly drop?

Three reasons (by probability):

  1. Auto compression triggered (context > 95%) β€” prefix changes, cache invalidates from compression point
  2. /compact executed β€” same effect
  3. TTL expired β€” 5 minutes without new request

First two are "structural invalidation" β€” unavoidable. The third can be proactively managed (see Q13).


Next Episode: Episode 18 β€” the final Q&A covering HUD debugging, Memory, and advanced topics.