@rohanpaul_ai: LLMs may not need human-style language. i.e. future AI systems might save context space by using dense model-readable m…
Summary
This paper introduces BabelTele, a compressed writing style that uses abbreviations, symbols, and mixed-language fragments to reduce text length by 72.1% while preserving 99.5% semantic fidelity for LLMs, arguing that human readability and machine recoverability are separable.
View Cached Full Text
Cached at: 06/26/26, 10:10 AM
LLMs may not need human-style language.
i.e. future AI systems might save context space by using dense model-readable messages instead of long normal prose.
The authors propose BabelTele, a compressed writing style that can mix abbreviations, symbols, fragments from different languages, and unusual structure.
To a capable language model, it can still carry enough structure to answer questions, preserve memory, and pass information between agents.
The point is that human readability, natural-language fluency, and machine recoverability are separable properties.
Human prose carries redundancy because humans need rhythm, grammar, context, and reassurance.
Models trained on huge symbolic mixtures may not need all of that scaffolding every time.
In the paper’s strongest result, BabelTele keeps about 99.5% semantic fidelity while shrinking text to 27.9% of its original length.
Link – arxiv. org/abs/2606.19857
Title: “LLMs Do Not Always Need Readable Language”
Similar Articles
What would optimal use of LLMs even look like?
Explores the speculative idea of optimizing human interaction with LLMs by conforming to their native communication patterns, such as using neuralese, rather than forcing them to adapt to human language.
Why can't LLMs be trained to think in an optimized AI language rather than English?
A speculative discussion questioning why LLMs are not trained to think in an optimized internal language rather than natural language, and whether that could improve efficiency.
Auto-regressive LLMs are officially sleeping with the fishes (Yann LeCun was right)
Project CETI used LLM architectures to decode sperm whale clicks, revealing a phonetic alphabet but also highlighting that AI's statistical pattern-matching lacks true comprehension. The article argues that AGI requires embodied, multimodal grounding rather than just scaling text-based models.
A modest proposal: Reformat everything to make documents more palatable to AI (5 minute read)
The LF AI & Data Foundation has formed a working group to develop DocLang, an AI-friendly document format backed by IBM, NVIDIA, Red Hat, ABBYY, HumanSignal, and Forgis, aiming to solve the problem of existing formats like PDF and HTML being ill-suited for AI parsing.
Large Language Models of Babel
The article reflects on the history of text generation, drawing parallels between modern LLMs like GPT-4 and earlier concepts from Jorge Luis Borges and Claude Shannon. It explores how Shannon's probabilistic experiments and Borges' 'Library of Babel' metaphor help clarify fundamental questions about the nature of generated text and data structure.