All articles, most recently crawled first.
DeepSeek releases version 4 of its GLM model, version 5.2 PRO.
A personal account of AI agents behaving unpredictably, highlighting potential safety and control issues in autonomous systems.
This paper demonstrates the first successful positional control of adding and removing individual atoms on a silicon surface using inverted-mode scanning tunneling microscopy, representing a foundational step toward molecular assemblers.
A blog post argues that criticisms of the EU's online age verification approach are often uninformed, explaining why age restrictions are necessary for children and proposing a privacy-preserving method using signed attestations rather than full identity disclosure.
The article discusses the incompatibility of AT URI syntax with standard URI/URL specifications due to placing DIDs in the authority field, and explores potential solutions and their trade-offs.
Discusses the eventual profitability of AI companies despite current losses, emphasizing long-term economic viability.
Deep Agents introduces dynamic subagents that use programmatic orchestration via code scripts instead of tool calls, enabling reliable scaling and complex workflows. The feature integrates with a QuickJS code interpreter for lightweight execution.
A tweet from @nateherk promoting a skill that can significantly improve Claude's research capabilities, with a link to a 3-minute read.
This paper introduces a novel retrieval loop for RAG that uses reflection tokens and on-demand retrieval, allowing the model to decide when to fetch documents or rely on internal knowledge, with critique and tree-decoding to improve accuracy.
This tweet discusses the convergence of ML research on attention-based, matmul-optimized algorithms due to hardware constraints, drawing on the 'hardware lottery' concept and noting OpenAI's 9-month chip tape-out as a potential sign of hardware-research co-design.
A tweet listing 16 inference optimization techniques for achieving sub-second LLM responses, including KV-caching, speculative decoding, FlashAttention, and various parallelism methods.
Introduces 'skill neologisms', a method for enabling LLMs to learn new skills without weight updates, addressing catastrophic forgetting. Presented at ICML.
Meta was secretly using Google's Gemini for customer service, ad tools, and content moderation because it outperformed their own Llama models, until Google cut off access due to excessive capacity usage.
Zach Lloyd details how to build a spec-driven development agent within a cloud software factory, using triage and spec agents to handle ambiguous or complex issues by generating product and tech specs before implementation.
Qwopus-3.6-35B-A3B-MTP-Coder is a new open-source MoE fine-tune optimized for coding agent workflows with thinking disabled, offering fast token-efficient inference and competitive performance against similar models.
A comprehensive guide on 15 moves to leverage Claude's memory architecture, including custom instructions, projects, knowledge files, and system prompts, to avoid re-explaining context in every session.
A detailed guide on designing effective ML experiments, emphasizing starting with a clear research question, developing research taste, and scaling results. Based on the author's experience running ~100 experiments weekly at Poolside.
A practical guide explaining three levels of building self-improving AI agents, from manual loops to automated design, with recommended tools and frameworks.
Qwen-RobotNav is a scalable navigation model with a parameterized interface enabling dynamic task modes and observation parameters, achieving state-of-the-art performance through multi-task training and zero-shot generalization to real-world robotics.
Presents Qwen-RobotManip, a Vision-Language-Action foundation model for robotic manipulation that achieves generalization through unified alignment across representation, motion, and behavior dimensions, enabling large-scale training on diverse data sources. It outperforms prior state-of-the-art models across multiple out-of-distribution benchmarks and demonstrates emergent capabilities like zero-shot instruction following and cross-embodiment transfer.