Articles from Blog
An analysis of how transparent Google's DiffusionGemma model release is, discussing the implications for AI safety and accountability.
Inception Labs released Mercury 2, a diffusion language model that generates roughly 1,000 tokens per second and outperforms Google's DiffusionGemma on the AIME 2026 benchmark with a score of 90% versus 69.1%, though DiffusionGemma is free and open-weight while Mercury 2 is a paid, closed-weight API model.
Sakana AI introduces AB-MCTS, an inference-time scaling algorithm that enables multiple frontier AI models (Gemini 2.5 Pro, o4-mini, DeepSeek-R1-0528) to cooperate, significantly outperforming individual models on the ARC-AGI-2 benchmark.
sqlite-utils 4.0rc1 is a release candidate that adds built-in database migrations (ported from sqlite-migrate) and nested transactions via db.atomic(), along with minor backwards-incompatible changes.
sqlite-utils 4.0rc1 is a release candidate for the Python CLI tool that simplifies SQLite database manipulation.
Samsung Electronics is deploying OpenAI's ChatGPT Enterprise and Codex to employees globally, representing one of OpenAI's largest enterprise deployments to enhance productivity across R&D, marketing, and manufacturing.
Cloudflare announced temporary accounts that let developers deploy Workers to ephemeral projects without sign-up, lasting 60 minutes and optionally claimable for permanence, aimed at AI agents but useful broadly.
A quote from Sean Lynch highlighting that the main value of the Model Context Protocol (MCP) is isolating authentication flow outside the agent's context window, potentially just an auth gateway for APIs.
MIT researchers have developed a machine-learning-based approach to accurately model the behavior of metal alloys, regardless of chemical complexity, enabling faster and cheaper materials innovation.
Anthropic's release of Claude Fable 5 with restrictive guardrails and the U.S. government's subsequent export controls on the model have sparked concerns about AI sovereignty and the stability of proprietary AI platforms.
DeepSeek reportedly requires investors to promise not to poach its talent as part of its $7.4 billion fundraising round, highlighting the intense competition for AI engineers in China.
Baseten, a $13 billion AI startup, provides software and computing capacity to companies using lower-cost AI models as alternatives to OpenAI and Anthropic.
Mistral AI is adding dedicated Code and Apps sections to its Vibe (Le Chat) web platform, turning it from a conversational interface into a development and app-building environment. A new large, sparse mixture-of-experts model is also confirmed for summer release as open weights.
Yann LeCun calls Elon Musk's xAI a 'failure' and warns that high AI spending could lead to a 'big bubble explosion', criticizing the company's ability to compete with OpenAI and Anthropic.
Google is adopting Nvidia's strategy to build a competitive AI chip business, renting TPU computing power to Anthropic and boosting inference performance to rival Nvidia's dominance.
ZPPO introduces a replay buffer for hard questions in reinforcement learning for LLMs/VLMs, allowing repeated exposure to gradually improve rollout accuracy without policy drift. The method graduates more hard questions than GRPO, especially those with near-zero initial accuracy.
OpenAI researchers show that reinforcement learning on realistic scenarios targeting beneficial traits (honesty, transparency, corrigibility) produces broad improvements across dozens of alignment benchmarks, with gains generalizing beyond training domains and persisting under adversarial pressure.
Perplexity Brain is a memory system that builds a persistent context graph across tasks, projects, decisions, files, and sources, enabling agents to start with relevant context instead of from scratch, improving answer correctness and reducing task costs.
Midjourney, known for its AI image generator, announces a full-body ultrasonic scanner that can scan the body in under 60 seconds, developed in partnership with Butterfly Network. The company plans to open spas for the service and aims for FDA approval and worldwide deployment by 2031.
OpenAI is preparing to release the GPT-5.6 family, including standard, Mini, and Pro variants, with a rumored 1.5 million token context window and improved agentic coding capabilities, targeting a Tuesday launch amid a competitive landscape with Anthropic.