News

Data Reveals 93% of Claude Code Sessions Are Redundant Noise, Paving Way for Drastic Size Reduction

Data Reveals 93% of Claude Code Sessions Are Redundant Noise, Paving Way for Drastic Size Reduction

A recent in-depth analysis has revealed that an astonishing 93% of an AI code assistant's session file, specifically Claude Code, constitutes "noise" rather than essential information. A developer successfully demonstrated this by creating a session distiller that shrinks a 70MB session file down to just 7MB, significantly enhancing efficiency and storage utilization.

Before writing a single line of code for the distiller, the developer sought answers to three crucial questions: What exactly is contained within a 70MB session? What can be safely discarded? And how can one prove that no vital information is lost in the process?

Dissecting a 70MB Session

By categorizing every byte of a real 70MB JSONL session, the breakdown proved surprising:

  • JSON envelope (sessionId, cwd, version, gitBranch): ~54%
  • Tool results (Read, Bash, Edit, Write, Agent): ~25%
  • Base64 images (screenshots, UI captures): ~12%
  • Thinking blocks (internal reasoning): ~4%
  • Actual conversation text: ~3%
  • Progress lines, file-history-snapshots: ~2%

The most unexpected finding was the 54% attributed to the "JSON envelope." In a 70MB file containing thousands of lines, every single JSONL line repeats identical envelope fields such as sessionId, userType, cwd, version, and gitBranch. This means 38MB of the file consists of the same JSON keys and values appearing repeatedly.

The actual conversation, the direct exchange between the user and Claude, accounts for a mere 3% of the file. The vast remainder is either redundant metadata or tool output that has already served its purpose hours ago.

Why Tool Results Are Safe to Strip

Not all tool results hold equal value or re-obtainability:

ToolSafe to Strip?Reason
ReadYesThe file remains on disk; Claude can re-read it in 50ms. Storing 28MB of unchanged file content is pure waste.
BashMostlyBuild outputs, test runs, git log results are stale once captured. Keeping only the first 5 and last 5 lines—the command and its success/failure—is sufficient.
EditPartiallyThe file path and changes are important, but not the full file content. Preview snippets (200 chars each) of old_string and new_string suffice to remember the intent.
WritePartiallySimilar to Edit; preserve the file path and a head/tail preview.
AgentKeep moreResearch reports and analysis from subagents contain synthesized knowledge. Up to 2000 characters should be preserved.
ScreenshotsYesBase64 images from hours ago depicting outdated UI states. Claude cannot even display them once a session exceeds certain size limits.

This approach is supported by research. A JetBrains NeurIPS 2025 study explored two methods for handling tool outputs in coding agents: observation masking (replacing results with placeholders) versus LLM summarization. Both yielded identical task performance. This suggests that the model does not require the raw output once it has processed it and generated a response; the response itself embodies the knowledge.

As one researcher aptly put it: "When Claude reads 847 lines and responds 'this uses JWT with refresh tokens in httpOnly cookies,' that sentence is the knowledge. The 847 lines were consumed to produce it."

↗ Read original source