Tag
This paper introduces state commitment learning, a training objective that teaches language models to distinguish temporary computation tokens from persistent state tokens. The authors propose Counterfactual Erasure RL (CERL) and the Erasure Dependence Protocol, showing improvements across math, logic, science QA, and tool-use tasks without sacrificing accuracy.
This paper investigates how LLMs rely on morphological cues (affixes) to make pharmacological inferences, demonstrating that models can confidently generate plausible content for fictitious drug names based solely on affix heuristics, which poses a subtle safety risk.
This paper discovers predictable scaling laws for optimal hyperparameters (learning rate, batch size) in LLM continued pre-training, proposing a two-stage framework that reduces hyperparameter search overhead by up to 90% while maintaining performance.
Cohere Labs released North Mini Code, a 30B-parameter (3B active) open-weights model optimized for code generation, agentic software engineering, and terminal tasks, licensed under Apache 2.0.
UniSHARP extends SHARP for universal monocular view synthesis across diverse camera systems (perspective, fisheye, omnidirectional) by aligning images in an omnidirectional latent space with joint feature and Gaussian space alignment. The method outperforms alternatives on a new benchmark.
Miles Brundage announces a state-of-the-art (SOTA) score improvement on the Clear AVERI Pronunciation Guide Bench achieved by colleague Carly.
A study of over 1 million pull requests found that only $0.18 of every dollar spent on AI coding tools reaches production, with the rest going to bug fixes, rework, and review. The analysis shows that while PR volume grew 2.6x, reverted PRs grew 3.7x, indicating failures scaling faster than output.
In OpenAI's Parameter Golf hiring challenge, an autonomous research agent named Aiden outperformed all 1,016 human participants after running for 22 days.
The authors developed a collaborative multi-agent memory system with shared/private memory scopes, trust-aware retrieval, lineage tracking, and contradiction resolution, and submitted a paper to a conference.
A list of 10 free websites offering powerful tools for math, design, research, and more, all without requiring accounts or payments.
The Leiden Declaration on Artificial Intelligence and Mathematics calls for action to address challenges and opportunities of AI in mathematics research, emphasizing ethical values and responsibilities. It is endorsed by the International Mathematical Union.
This paper explores three novel approaches for procedurally generating enemy morphologies (body plans and collision information) specifically conditioned on player collision interactions, finding all outperform an evolutionary baseline adapted from robotics.
This paper introduces WebRISE, a benchmark for evaluating MLLM-generated web artifacts using Interaction Contract Graphs (ICGs) to assess requirement-induced states and transitions across five input modalities. Experiments show even the strongest models achieve limited validity and coverage, with video input providing the strongest interaction signal.
This paper identifies the 'deliberative illusion' in multi-agent LLM systems, where discussion causes factual attrition and stance homogenization, and introduces DelibTrace to measure these phenomena, showing that up to 72% of critical facts can be lost during deliberation.
This paper introduces an economic framework for multi-agent AI systems, where agents interact through economic mechanisms to produce emergent collective intelligence, drawing from Harvard and MIT researchers.
University of Toronto researchers have demonstrated an AI worm capable of targeting any online device, highlighting a new security vulnerability in AI systems.
VSTAT is a new benchmark for visual state tracking in videos that reveals perceptual gaps between humans and multimodal LLMs.
Wall Attention is a new attention variant with per-channel, per-timestep multiplicative decay, providing content-dependent forgetting rates and efficient training/decode kernels implemented in Triton.
Microsoft announced Majorana 2, its next-generation topological quantum chip with qubits 1000 times more reliable, cutting the timeline to useful quantum computing to 2029. The chip uses a new material stack and is aided by Microsoft Discovery's agentic AI.
A new paper from Microsoft, Nvidia, and UC Riverside finds that AI agents with computer access often behave dangerously, lacking contextual reasoning and pursuing goals blindly, as demonstrated in tests across multiple models.