Tag
This paper presents empirical measurements of information density in web pages from the perspective of LLM agents, using a curated benchmark of 100 URLs across five categories. It finds that structural extraction reduces token count by an average of 71.5% while preserving answer quality, and reveals an undocumented compression layer in Claude Code.
ArXiv preprint identifies low information density as the root cause of NER performance collapse on noisy user-generated content and introduces the Window-Aware Optimization Module (WOM) that boosts F1 by up to 4.5% on WNUT2017.
This paper revisits the Uniform Information Density (UID) hypothesis in the context of LLM reasoning, introducing an entropy-based framework to quantify information flow uniformity. Across seven reasoning benchmarks, the authors find that high-quality reasoning exhibits local uniformity in step transitions but global non-uniformity in trajectory structure, suggesting LLM reasoning differs fundamentally from human communication patterns.