DeepSeek V4 Cost Reduction: Reasonix Achieves 99.82% Cache Hit Rate, Slashing Long Session Costs by 80%

Following the permanent price reduction for DeepSeek V4 models, the open-source community has taken cost optimization a step further. A project named Reasonix has achieved an impressive 99.82% cache hit rate for DeepSeek V4's long sessions, drastically cutting operational expenses. This translates to a 400M+ token bill, originally costing $61, now reduced to just $12, an 80% decrease.

Reasonix is a terminal coding harness specifically designed for DeepSeek, aiming to provide significant cost savings for users. It maintains over 90% cache hit rates in long sessions, reducing input token costs to one-fifth of the original.

Reasonix's core implementation leverages a byte-stable prefix-cache and an "append-only" execution loop. This design specifically caters to DeepSeek's caching mechanism: by fixing older contexts and only appending new messages, it ensures the initial part of each request remains consistent, thereby maximizing cache hit rates and effectively lowering costs for extended conversations.

Its architecture comprises three key components:

Cache-First Loop: The automatic prefix-cache activates only when the exact byte prefix of the current request matches a previous one. To counter the common issue where most agent loops reorder, rewrite, or inject new timestamps with each interaction, Reasonix divides the context into three distinct areas:
- Prefix Area: Contains fixed content, computed only once per session.
- History Message Area: Append-only, preventing rewriting.
- Scratchpad Area: Any information here must undergo "Tool-Call Repair" before being committed to the log.
Tool-Call Repair: DeepSeek frequently encounters issues such as tool call JSON being generated internally but disappearing from the final message, malformed JSON parameters, repetitive tool calls with identical parameters (duplicate call storms), and truncated JSON. Reasonix's repair mechanism attempts to fix these problems through four processing rounds before actual execution.
Cost Control:
- Defaults to the more economical v4 flash model, automatically switching to v4 pro only for difficult tasks.
- Automatically compresses context at the end of each round.
- Users can manually input /pro to switch the conversation model to v4 pro for the next round; Reasonix automatically reverts to the cheaper model afterward.
- Failure signals trigger an automatic upgrade: if the number of failures reaches a predefined threshold, the remaining portion of the current round will switch to the v4 pro model.

Reasonix is straightforward to install and use. Users simply navigate to the project directory and execute npx reasonix code to start a TUI session. A desktop version is also available.

It's important to note that Reasonix is explicitly designed for DeepSeek. Its abstractions are built entirely on DeepSeek's features, making it non-generic, and there are no plans to release general-purpose functionalities.

The project has sparked considerable discussion within the tech community. While its cost optimization is highly lauded, some developers question the necessity of a DeepSeek-native programming agent. One user shared an experience of achieving over 95% cache hit rate by simply adapting DeepSeek V4 Pro's API format for use within Codex, suggesting that non-native solutions can also yield significant cost efficiencies. Regardless of the chosen approach, cost saving remains a primary concern for developers.

Project Link: https://github.com/esengine/DeepSeek-Reasonix

DeepSeek V4 Cost Reduction: Reasonix Achieves 99.82% Cache Hit Rate, Slashing Long Session Costs by 80%

Next Stories to Read

SoftBank Shares Soar to Record Highs, Fueled by OpenAI IPO Speculation and Broader AI Enthusiasm

Google Integrates Emoji Reactions into Gmail for Enhanced Communication Efficiency and Nuance

Anthropic Employee Uses Claude AI to Build Viral "Spotify Wrapped" Wedding Site from 12 Years of iMessages; Angry Emoji Sparks Online Buzz