Tag
This paper proposes Global-Local Uncertainty (GLU), an unsupervised single-pass score that fuses token-level local entropy with hidden-state geometric global entropy for uncertainty quantification in LLMs, showing that the two are near-orthogonal and together capture confident-but-wrong failures.
Bebop proposes entropy-aware multi-token prediction with rejection sampling and a novel TV loss to accelerate RL training of LLMs, achieving up to 1.8x speedup. The method addresses the degradation of acceptance rates during RL by optimizing training objectives.
This paper examines multi-agent systems (MAS) from an entropy perspective, analyzing intra- and inter-agent dynamics. It finds that single agents often outperform MAS and introduces the Entropy Judger algorithm to improve MAS performance.
A technical blog post exploring randomness, Linux entropy, and building a tool called morerandom that uses WASM plugins to feed the system entropy pool.
This paper introduces the Eisbach log-barrier, a parameter-free weight derived from the entropy of DiT output's spatial energy distribution, which when applied to LoRA fine-tuning of Stable Audio 3 improves musical diversity and thematic development without causing mode collapse.
This paper introduces Canopy Entropy (CE⋆) to measure the effective size of the generation space in language models, and finds that fine-tuning reorganizes uncertainty into more informative and semantically meaningful outputs, nearly tripling the correlation between entropy rate and semantic diversity.
Proposes EKSFT, a selective fine-tuning method for large language models that masks tokens with high entropy or high KL divergence from a reference model, preserving pre-trained distribution while injecting task knowledge. Experiments on mathematical reasoning benchmarks show it outperforms standard SFT and improves subsequent RL fine-tuning.
The paper proposes High-Entropy Sum (HES), a training-free metric for selecting high-quality reasoning data for LLM training, validated across SFT, RFT, and RL paradigms.
This paper investigates the phenomenon where large language models hallucinate despite having the correct answer available in their generation-time distribution. By introducing a semantic notion of answer availability, the authors show that 16-47% of instruction-tuned model hallucinations occur when the correct concept is already represented, and that this rate increases with scale. They identify that instruction tuning sharpens answer commitment, making helpfulness and confident hallucination two sides of the same coin.
This paper proposes a model-agnostic probabilistic token attribution measure for LLMs using Bayes' rule to invert next-token log probabilities, capturing the model's internal representation of token sequences and improving interpretability through entropy analysis.
This paper introduces Digit Entropy Loss (DEL), a novel loss function for numerical learning in large language models that reformulates entropy optimization to improve digit-level prediction accuracy and handle floating-point numbers, consistently outperforming existing methods on mathematical reasoning benchmarks.
This paper proposes a framework that uses entropy-based diagnostics to harmonize spatial and temporal feature representations, achieving substantial accuracy gains on large-scale spatiotemporal prediction tasks across urban traffic, meteorology, and epidemic datasets.
An article exploring the philosophical and practical meaning of randomness, using lava lamps as a metaphor for entropy generation in computing.
A reflection on how AI agents fail in production due to accumulated state issues (stale context, expired tokens, conflicting memory) rather than reasoning flaws, emphasizing the need for better state management.
An exploration of pseudo-random number generation in computers, focusing on linear congruential generators (LCGs) and their quality visualization. The article also touches on entropy sources like Cloudflare's lava lamps and serves as a precursor to property-based testing.
A new paper proposes sequential KV cache compression using probabilistic language tries and predictive delta coding, achieving theoretical compression ratios of ~914,000× beyond TurboQuant by exploiting the sequential structure of language model tokens rather than treating vectors independently.
The article explores the concept of 'Value Freedom' in AI agents using Reinforcement Learning, framing it as a measure of unpredictability and entropy derived from Q-values.