⚡ News

5 Tips to Cut Claude Code Token Usage by 30%

5 Tips to Cut Claude Code Token Usage by 30%

I've been using Claude Code daily, and while the quality is top-notch, the API costs can be significant. I've discovered five habits that consistently cut token consumption by 25–35% without compromising code quality.

1. Use CLAUDE.md at Project Root

Claude Code reads CLAUDE.md automatically as durable context. Without it, Claude re-discovers your project structure every session, wasting file-reading tokens. Use a minimal template (under 200 lines) covering tech stack, file layout, and conventions to ensure Claude understands your environment immediately.

2. Leverage Prompt Caching

Anthropic’s prompt caching makes repeat context significantly cheaper (~10% of normal price). For a 200K-token project, maintaining a high cache hit rate can drop session costs from $0.60 to $0.18. It is the most effective way to handle large codebases economically.

3. Optimize Interaction Habits

To keep the cache valid: avoid changing CLAUDE.md mid-session, append to your prompts rather than rewriting them, and refer back to previously uploaded code instead of re-pasting it. These small behavioral shifts ensure you maximize the benefit of cached tokens.

4. Prefer Read Tool Over Pasting

Instead of pasting 5000 lines of code, use the command `Read src/foo.go`. This is cheaper because Claude only accesses the file when necessary and often reads specific slices rather than the entire file. Manual pasting forces you to pay for every line regardless of its relevance to the task.

5. Switch to Smaller Models for Routine Tasks

Opus is overkill for boilerplate, logging, or renaming variables. Claude Code allows switching models per session; Sonnet or Haiku can handle roughly 70% of daily edits at a 5x–15x lower cost. Save Opus for complex reasoning, architecture decisions, and deep bug investigations.

What to Avoid

Manual prompt compression often leads to context loss for marginal savings. Additionally, be wary of ultra-cheap third-party "Opus" relays, which are often inferior models in disguise, leading to a drastic drop in coding accuracy.

↗ Read original source