Tag
This paper introduces a unified benchmark for span-level hallucination detection in RAG systems that extends beyond natural language to code, tool output, and structured documents, and presents a fine-tuned Qwen3.5-2B detector that outperforms existing methods on these new domains while remaining competitive on standard NLP benchmarks.
Moonshot AI's Kimi and Zhipu AI's GLM have achieved notable results on frontier code benchmarks.
OpenReward and TRL now support training on over 350 reinforcement learning environments with minimal code.
Identifies Supervision Fidelity Decay (SFD) in on-policy distillation, where teacher supervision degrades as student sequences lengthen, and proposes Lookahead Group Reward (LGR) to mitigate SFD, improving performance on math and code benchmarks.
A tweet discussing whether to use Python for writing AI Agents, citing Shunyu Yao's ReAct source code (just a few Jupyter notebooks), claiming these notebooks kicked off the Agent era, and criticizing the attitude of looking down on specific programming languages.
Greg Kamradt proposes a 7-level spectrum of verification difficulty for AI, ranging from instantly verifiable domains like math and code to civilization-scale systems with slow, noisy feedback.
Framed is a tool that turns screenshots, videos, and code into polished visuals, available on Product Hunt.
This paper challenges the belief that code improves reasoning in language models, finding through controlled pretraining experiments that code alone primarily enhances programming ability, while reasoning gains come from structured reasoning traces like code-text and math-text mixtures.
A tweet reports that Renaissance Technologies' entire trading structure has been leaked on GitHub under an Apache 2.0 license, amassing 76,800 stars.
Yann LeCun states that LLMs are strongest in domains where language is the substrate of reasoning, like math and code, but they are not creative mathematicians, software architects, or computer scientists.
A Twitter post sharing a compact GLSL shader program (fractal/raymarching implementation) by user @YoheiNishitsuji
This article argues that a comprehensive specification is not equivalent to code, because a spec defines a set of possible implementations while code is one concrete instance. It discusses the role of abstraction and why programmers are still needed to write specs even with automated code generation.