News

LLM-Based Entropy Coding for Real-Time Text Transmission Over Fixed-Rate Channels: A Compression-Delay Tradeoff Analysis

LLM-Based Entropy Coding for Real-Time Text Transmission Over Fixed-Rate Channels: A Compression-Delay Tradeoff Analysis

The fields of learning, prediction, and data compression are intricately linked. A model capable of accurately predicting the next symbol in a sequence can be effectively paired with a source coder to compress that sequence remarkably close to its information-theoretic limit. However, challenges arise in real-time text transmission: when tokenized characters arrive at a fixed reading pace, are encoded into variable-length codewords, and streamed over a fixed-rate channel, a queue inevitably forms. The per-token delay in this queue is contingent upon the mean and variance of the bit lengths, as well as the coder's inherent algorithmic latency.

A recent study investigates the critical compression-delay tradeoff that emerges when a causal language model (LLM) is employed as the sequential predictor within a predict-then-code architecture, specifically for real-time text transmission. The research benchmarked several prominent coding schemes, including the theoretical Shannon limit, Huffman coding, arithmetic coding, rANS at various block sizes, and gzip. A key aspect of the analysis involved discerning between algorithmic delay, which is intrinsic to the coder's design, and computational delay, which progressively diminishes with advancements in hardware.

Key findings indicate that Huffman coding presents a practical and efficient choice for over-provisioned channels, primarily due to its zero algorithmic delay and only modest compression overhead. Conversely, arithmetic coding achieved near-optimal compression levels, though at the expense of introducing a decodability delay. These conclusions were rigorously validated across a significant range of model scales: from GPT-2 (124 million parameters) to Llama 3.2 (3 billion parameters), representing a twenty-five-fold increase in parameter count. This substantial scaling resulted in an approximate 38% reduction in bits per character, a gain that effectively over-provisions the communication channel and consequently alters the determination of the optimal coder for a given scenario.

↗ Read original source