This paper proposes a thermodynamic measure of intelligence, defining intelligence as the ability to make rare but valid futures more likely. It introduces a metric called 'rare-valid lift' that quantifies how much more often a system produces unlikely but acceptable outcomes compared to a passive baseline.
This paper investigates whether LLMs can reliably self-report when their outputs have been compromised by adversarial prefills, finding that models often cannot distinguish between compromised and intentional outputs, and their limited recognition stems from normal refusal behavior rather than true self-awareness.
ActiveGraph announces two new papers on agent memory (LongMemEval) and self-improvement regimes, along with reference agents, pack templates, and upcoming meetups in Seattle and San Francisco.
MIT engineers discovered that rice seeds germinate 30-40% faster when exposed to the sound vibrations of falling rain, providing the first direct evidence that plant seeds can sense sound as a cue for optimal growth depth.
MIT researchers developed injectable hydrogel microspheres that, combined with hepatocytes, form stable mini livers in mice, potentially offering a non-surgical alternative to liver transplantation.
MIT researchers developed a wristband with ultrasound stickers that images muscles and tendons, using AI to translate those images into hand movements to wirelessly control a robotic hand with high dexterity.
Research from the MIT Hardness Group proves that Super Mario levels can be undecidable, meaning no computer program can always determine if Mario can reach the castle, placing Super Mario in the hardest complexity class.
Explains that super weights in large language models arise from the SoftMax-Attention interaction creating a 'Nothing Dump' token that serves as a stable reference point; removing these weights cripples performance.
OpenMythos introduces a new open-source benchmark for evaluating AI models on mythological knowledge.
This paper argues that LLM-based coding agents have reached a capability threshold making human code review redundant, and proposes replacing human inspection with agent-driven verification to reduce costs and latency.
This paper introduces a new approach leveraging certainty in transformer models, building on the 'Attention Is All You Need' paradigm.
This technical report presents Ling-2.6 and Ring-2.6, a family of trillion-parameter models designed for efficient and instant agentic intelligence, featuring architectural upgrades like hybrid linear attention and specialized training methods including KPop reinforcement learning. All checkpoints are open-sourced.
A brain-inspired AI architecture promises to deliver faster computing while consuming far less power, potentially advancing energy-efficient AI hardware.
This paper investigates an alignment vulnerability in instruction-tuned LLMs, specifically Gemma-3-12B, by showing that pre-token hidden state shifts can act as an alignment policy traversal vector, potentially enabling bypass of safety measures.
F3 is a next-generation open-source data file format that uses embedded WebAssembly decoders for interoperability and extensibility, addressing limitations of legacy formats like Parquet. It is currently a research prototype from a paper published in ACM.
The author maps the Kullback-Leibler divergence of KV cache quantization for the Qwen3.6-35B-A3B and Gemma4-E2B QAT models.
Agent Profiles is a new method that enhances AI safety, focus, and reusability by defining structured profiles for AI agents.
Lift4D is a test-time optimization framework that reconstructs complete 4D geometry, appearance, and deformation of dynamic objects from a single monocular in-the-wild video, improving over prior methods on challenging sequences with occlusions and non-rigid motion.
Microsoft's NextLat introduces a training objective that rewards belief-state representations instead of relying solely on next-token prediction, pushing models toward compact world models for better generalization.
Nabla Bio unveils JAM-2, a model for zero-shot drug design achieving atomic-precision, computationally designed multispecific antibodies and dual-variant KRAS multispecifics with high potency and selectivity, validated with Cryo-EM and wet-lab experiments.