Information is already there

Neural Arithmetic Compression
llama.cpp Backend

Nacrith

Advanced lossless text compression using a neural language model and arithmetic coding. Runs on GPU or CPU via llama.cpp — no configuration needed.

0.94
bits/byte (text)
90%+
space savings
3-4×
vs gzip / xz
135M
param model (SmolLM2)

How Nacrith Is Different

Prediction-based arithmetic coding driven by a neural language model

LLM-Powered Prediction

Uses SmolLM2-135M with a ~49K token vocabulary to assign probability distributions over the next token. The better the prediction, the fewer bits are needed.

llama.cpp Backend

Inference runs via llama.cpp, which is ~7× faster than PyTorch. Automatically uses GPU if available, falls back to CPU — no extra configuration required.

Arithmetic Coding

Mathematically optimal encoding: each token is encoded in proportion to its predicted probability. A token predicted at 99% costs ~0.014 bits; only truly surprising tokens are expensive.

Parallel Workers

Input is split into chunks and distributed across multiple workers running concurrently. Each worker operates an independent LLM + arithmetic coding pipeline for maximum throughput.

Binary File Support (NC06)

Binary files are segmented into text-like and binary chunks. Text chunks use the neural pipeline; binary blobs use lzma or gzip. The result is always at least as good as compressing the whole file with lzma.

Fully Lossless

Both sides run the exact same model with the exact same weights, producing identical probability distributions. Decompressed output matches the original byte-for-byte, always.

Benchmarks

In our experiments, Nacrith produces the strongest compression among all evaluated systems

Nacrith vs Neural & Advanced Compressors

Bits per byte — lower is better

SystemModel / Notesalice29.txt (bpb)enwik8 (bpb)
gzip -92.8512.916
CMIX v21LSTM + 2,000+ models1.6351.170
NNCP v3Transformer-XL (online)3.960~1.190
PAQ8px -8LContext mixing1.728~1.270
ts_zipRWKV-169M~1.142~1.110
FineZipLLaMA-3-8B (fine-tuned)1.024
NacrithSmolLM2-135M + llama.cpp0.9180.9389

In our experiments: On enwik8 (95.4 MB Wikipedia), Nacrith achieves 0.9389 bpb — outperforming ts_zip (~1.11 bpb) by 15%, FineZip (1.024 bpb) by 8% with a 60× smaller model and no fine-tuning, and CMIX v21 (1.17 bpb) by 20%.

alice29.txt

148.5 KB, Canterbury Corpus

alice29.txt compression comparison — Nacrith achieves 0.918 bits per byte

enwik8

95.4 MB, Wikipedia

enwik8 compression comparison — Nacrith achieves 0.9389 bits per byte

Compression Ratio

Compressed / Original — lower is better

Compression ratio comparison — Nacrith achieves 8-10% compression ratio

Space Savings

Space saved % — higher is better

Space savings comparison — Nacrith achieves 90-92% space savings

Key Observations

~8-12% compression ratio on English text — roughly 3× better than gzip and 2.5× better than bzip2 in our experiments.

On alice29.txt, Nacrith achieves 0.918 bpb — 44% better than CMIX v21 and 20% better than ts_zip among evaluated systems.

Space savings of 88–92% consistently across small, medium, and large files.

On enwik8 (95.4 MB), Nacrith achieves 0.9389 bpb — the strongest result among evaluated systems.

Uses ~1.2 GB VRAM for the first worker, plus ~660 MB per additional worker. Parallel workers substantially improve throughput.

All results are fully lossless — decompressed output matches the original byte-for-byte.

Beyond the Shannon Entropy Limit

Measured on a 100 KB English text sample. Shannon limits represent theoretical lower bounds for compressors of that order.

MethodCompressed SizeBits / Byte
Shannon 0th-order limit60.3 KB4.8025
Shannon 1st-order limit44.2 KB3.5213
gzip -939.0 KB3.1082
xz -935.5 KB2.8257
Shannon 2nd-order limit34.4 KB2.7373
Nacrith9.6 KB0.7635

Nacrith achieves 0.76 bits/byte 84% below the 0th-order Shannon limit, 78% below the 1st-order limit, and 72% below the 2nd-order limit. To our knowledge, these represent strong compression results.

How It Works

Prediction-based arithmetic coding using a neural language model

The LLM + Arithmetic Coding Pipeline

Nacrith exploits the deep connection between prediction and compression (Shannon, 1948): a good predictor of text can be turned into a good compressor.

The LLM (SmolLM2-135M via llama.cpp)

A transformer neural network with a ~49K token vocabulary. Given a sequence of tokens, it outputs a probability distribution over the entire vocabulary for what comes next. It captures grammar, common phrases, semantic relationships, and world knowledge — far beyond simple byte-pattern matching. Inference runs via llama.cpp, which is ~7× faster than PyTorch, and automatically targets GPU or CPU.

Arithmetic Coding

A mathematically optimal encoding scheme that maps a sequence of symbols to a single number in [0, 1). For each symbol, it narrows the interval proportionally to that symbol's probability. High-probability symbols barely shrink the interval (costing almost zero bits), while unlikely symbols shrink it a lot.

Together

The LLM provides the probabilities; the arithmetic coder turns them into bits. A token predicted at 99% confidence costs ~0.014 bits. A token at 50% costs 1 bit. Only truly surprising tokens are expensive.

Compression
Input text → Split into N chunks
Each worker independently:
1. Tokenize chunk
2. For each token:
• LLM predicts P(next | context)
• Arithmetic encoder narrows interval
→ Compressed stream
→ Combine streams into NC06 file
Decompression
NC06 file → Split streams
Each worker:
For each position:
• Same LLM predicts P
• Arithmetic decoder recovers token
• Token feeds back as context
→ Tokens
→ Concatenate + Detokenize → Original

Key: Both sides run the exact same model with the exact same weights, producing identical probability distributions. This symmetry guarantees perfect lossless reconstruction.

Why Nacrith Differs from Traditional Compressors

Traditional (gzip / xz / zip)

Pattern matching on raw bytes within a sliding window. Only exploits local, literal repetitions.

Nacrith

Captures semantic and syntactic structure. It knows that after "The President of the United", the word "States" is extremely likely — even if that phrase never appeared recently. This deep understanding of language produces far better predictions, which directly translates to fewer bits.

Binary File Compression (NC06)

Nacrith also supports compressing binary files such as PDFs, executables, and other non-UTF-8 data using a hybrid chunked approach. Binary mode is activated automatically when the input file is not valid UTF-8.

1

Byte Classification & Segmentation

Every byte is classified as text-like (printable ASCII, tab/LF/CR) or binary. Contiguous runs are grouped, with short text runs (< 64 bytes) demoted to binary and small binary gaps bridged to keep text chunks contiguous.

2

Binary Blob Compression

All binary chunks are merged into a single blob and compressed with lzma (≥ 4 KB) or gzip (smaller blobs). If neither reduces size, raw bytes are stored as-is.

3

Neural Text Compression (Parallel)

Each text chunk is split across workers and compressed using the full LLM + arithmetic coding pipeline. Workers operate concurrently on their sub-chunks for maximum throughput.

Note: Binary files are rarely pure binary — they often contain significant amounts of embedded text (strings, metadata, markup, code). Nacrith exploits this by segmenting the input into text and binary chunks, then compressing each with an appropriate method for its type.

Hardware Requirements

Runs on any CUDA-capable GPU with at least 2 GB of VRAM (~1.2 GB for the first worker, ~660 MB per additional worker). Falls back to CPU automatically via llama.cpp when no GPU is available. Benchmarks were run on a low-end NVIDIA GTX 1050 Ti — a modern GPU will be significantly faster.

Ready to try Nacrith?

Experience advanced compression that leverages language understanding