Blog

Auditing DiffusionGemma Transparency (9 minute read)

TLDR AI ↗ · 3d ago Cached

An analysis of how transparent Google's DiffusionGemma model release is, discussing the implications for AI safety and accountability.

0 favorites 0 likes

Inception Labs' Mercury 2 AI Beats Google's DiffusionGemma at Its Own Game (4 minute read)

TLDR AI ↗ · 3d ago Cached

Inception Labs released Mercury 2, a diffusion language model that generates roughly 1,000 tokens per second and outperforms Google's DiffusionGemma on the AIME 2026 benchmark with a score of 90% versus 69.1%, though DiffusionGemma is free and open-weight while Mercury 2 is a paid, closed-weight API model.

0 favorites 0 likes

Sakana Fugu (3 minute read)

TLDR AI ↗ · 3d ago Cached

Sakana AI introduces AB-MCTS, an inference-time scaling algorithm that enables multiple frontier AI models (Gemini 2.5 Pro, o4-mini, DeepSeek-R1-0528) to cooperate, significantly outperforming individual models on the ARC-AGI-2 benchmark.

0 favorites 0 likes

sqlite-utils 4.0rc1 adds migrations and nested transactions

Simon Willison's Blog ↗ · 3d ago Cached

sqlite-utils 4.0rc1 is a release candidate that adds built-in database migrations (ported from sqlite-migrate) and nested transactions via db.atomic(), along with minor backwards-incompatible changes.

0 favorites 0 likes

sqlite-utils 4.0rc1

Simon Willison's Blog ↗ · 3d ago Cached

sqlite-utils 4.0rc1 is a release candidate for the Python CLI tool that simplifies SQLite database manipulation.

0 favorites 0 likes

Samsung Electronics brings ChatGPT and Codex to employees

OpenAI Blog ↗ · 3d ago Cached

Samsung Electronics is deploying OpenAI's ChatGPT Enterprise and Codex to employees globally, representing one of OpenAI's largest enterprise deployments to enhance productivity across R&D, marketing, and manufacturing.

0 favorites 0 likes

Temporary Cloudflare Accounts for AI agents

Simon Willison's Blog ↗ · 3d ago Cached

Cloudflare announced temporary accounts that let developers deploy Workers to ephemeral projects without sign-up, lasting 60 minutes and optionally claimable for permanence, aimed at AI agents but useful broadly.

0 favorites 0 likes

Quoting Sean Lynch

Simon Willison's Blog ↗ · 5d ago Cached

A quote from Sean Lynch highlighting that the main value of the Model Context Protocol (MCP) is isolating authentication flow outside the agent's context window, potentially just an auth gateway for APIs.

0 favorites 0 likes

A better way to model the behavior of metal alloys

MIT News — Artificial Intelligence ↗ · 5d ago Cached

MIT researchers have developed a machine-learning-based approach to accurately model the behavior of metal alloys, regardless of chemical complexity, enabling faster and cheaper materials innovation.

0 favorites 0 likes

Testing Mythos and Fable, Moving Beyond SWE-bench, Nvidia's Open Contender

The Batch ↗ · 5d ago Cached

Anthropic's release of Claude Fable 5 with restrictive guardrails and the U.S. government's subsequent export controls on the model have sparked concerns about AI sovereignty and the stability of proprietary AI platforms.

0 favorites 0 likes

‘No poaching' our people, China's AI behemoth DeepSeek reportedly tells investors (3 minute read)

TLDR AI ↗ · 6d ago Cached

DeepSeek reportedly requires investors to promise not to poach its talent as part of its $7.4 billion fundraising round, highlighting the intense competition for AI engineers in China.

0 favorites 0 likes

The $13 Billion AI Startup Betting on Cheaper Alternatives to OpenAI, Anthropic (4 minute read)

TLDR AI ↗ · 6d ago

Baseten, a $13 billion AI startup, provides software and computing capacity to companies using lower-cost AI models as alternatives to OpenAI and Anthropic.

0 favorites 0 likes

Mistral AI to get Code and Apps features on Vibe (2 minute read)

TLDR AI ↗ · 6d ago Cached

Mistral AI is adding dedicated Code and Apps sections to its Vibe (Le Chat) web platform, turning it from a conversational interface into a development and app-building environment. A new large, sparse mixture-of-experts model is also confirmed for summer release as open weights.

0 favorites 0 likes

Godfather of AI blasts Musk's xAI as 'failure,' says labs are risking a 'big bubble explosion' (4 minute read)

TLDR AI ↗ · 6d ago Cached

Yann LeCun calls Elon Musk's xAI a 'failure' and warns that high AI spending could lead to a 'big bubble explosion', criticizing the company's ability to compete with OpenAI and Anthropic.

0 favorites 0 likes

Google Is Using Nvidia's Playbook to Build a Rival AI Chip Business (11 minute read)

TLDR AI ↗ · 6d ago

Google is adopting Nvidia's strategy to build a competitive AI chip business, renting TPU computing power to Anthropic and boosting inference performance to rival Nvidia's dominance.

0 favorites 0 likes

Revisiting Hard Questions with Replay Buffers (8 minute read)

TLDR AI ↗ · 6d ago Cached

ZPPO introduces a replay buffer for hard questions in reinforcement learning for LLMs/VLMs, allowing repeated exposure to gradually improve rollout accuracy without policy drift. The method graduates more hard questions than GRPO, especially those with near-zero initial accuracy.

0 favorites 0 likes

Reinforcement learning towards broadly and persistently beneficial models (22 minute read)

TLDR AI ↗ · 6d ago Cached

OpenAI researchers show that reinforcement learning on realistic scenarios targeting beneficial traits (honesty, transparency, corrigibility) produces broad improvements across dozens of alignment benchmarks, with gains generalizing beyond training domains and persisting under adversarial pressure.

0 favorites 0 likes

Self-Improving Memory for Agents (6 minute read)

TLDR AI ↗ · 6d ago

Perplexity Brain is a memory system that builds a persistent context graph across tasks, projects, decisions, files, and sources, enabling agents to start with relevant context instead of from scratch, improving answer correctness and reducing task costs.

0 favorites 0 likes

Midjourney, the AI image generator, is developing a full-body ultrasonic scanner (3 minute read)

TLDR AI ↗ · 6d ago Cached

Midjourney, known for its AI image generator, announces a full-body ultrasonic scanner that can scan the body in under 60 seconds, developed in partnership with Butterfly Network. The company plans to open spas for the service and aims for FDA approval and worldwide deployment by 2031.

0 favorites 0 likes

OpenAI prepares GPT-5.6 models for the upcoming release (2 minute read)

TLDR AI ↗ · 6d ago Cached

OpenAI is preparing to release the GPT-5.6 family, including standard, Mini, and Pro variants, with a rumored 1.5 million token context window and improved agentic coding capabilities, targeting a Tuesday launch amid a competitive landscape with Anthropic.

0 favorites 0 likes

Blog

Submit Feedback