Yann LeCun's team releases LeWorldModel, a tiny 15M-parameter physics model trained on a single GPU in hours that outperforms billion-dollar foundation models in planning speed and physical plausibility, challenging the dominant scaling paradigm.
HiDream-ai has open-sourced HiDream-O1-Image (8B), a unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) that natively handles text-to-image, image editing, and subject-driven personalization at up to 2048×2048 resolution without external VAEs or disjoint text encoders. It debuted at #8 in the Artificial Analysis Text to Image Arena and is positioned as a leading open-weights text-to-image model.
OpenAI has launched three new real-time audio models to enable continuous, multitasking voice interactions that prioritize long-context reasoning, live translation, and seamless tool use.
Announces liquid-audio, an open-source repository for Liquid AI's end-to-end speech-to-speech LFM models (LFM2-Audio-1.5B and LFM2.5-Audio-1.5B) with interleaved and sequential generation modes and fine-tuning support.
MemReranker is a reasoning-aware reranking model family (0.6B/4B) designed for agent memory retrieval, addressing limitations in semantic similarity by incorporating LLM knowledge distillation for better temporal and causal reasoning.
OpenAI has launched GPT-Realtime-2, integrating GPT-5-level reasoning into the real-time voice API, enabling voice assistants to think and solve problems in real time during conversations.
GPT-5.5-Cyber is now in limited preview for defenders, offering a capable model for securing critical infrastructure.
OpenAI released the GPT-Realtime-2 voice model, featuring GPT-5-level reasoning capabilities and a 128,000 token context window. It supports real-time translation from over 70 input languages to 13 output languages, achieving 96.6% accuracy on the Big Bench Audio Intelligence benchmark. Greg Brockman called it a milestone in voice translation.
Shanghai Jiao Tong University has open-sourced the F5-TTS speech generation model, trained on 100,000 hours of data, supporting bilingual synthesis in Chinese and English and zero-shot voice cloning, and allowing commercial use.
The user reviews a quantized and fine-tuned version of the Qwen3.6-35B model optimized for Apple Silicon via MLX, praising its speed, intelligence, and lack of safety disclaimers.
Claude agents have added a new 'Dreaming' feature that enables self-optimization by reviewing historical conversations and extracting patterns. Together with multi-agent parallel orchestration and quality assessment, this marks the transition of AI agents into a self-evolution stage.
Release of a mixed-bit quantized version of the MiniMax M2.7 model, optimized to 74 GB for efficient local inference on Apple Silicon devices.
Zyphra releases ZAYA1-74B-Preview, a 74-billion parameter base model trained on AMD hardware, highlighting strong pre-RL reasoning capabilities and agentic performance signals.
OpenAI has released gpt-realtime-2, a new speech-to-speech model optimized for real-time voice agent interactions with low-latency tool calling.
A developer trained a 350M-parameter model capable of navigating spreadsheets better than Anthropic's Opus 4.6.
The authors present TOPAS, a recursive AI architecture achieving 11.67% on ARC-AGI-2 using a single RTX 4090, aiming to demonstrate that architectural efficiency can outweigh raw compute power.
Google's Gemma 4 achieves up to 3x faster inference speeds through speculative decoding and multi-token prediction, enabling efficient on-device deployment.
Satya Nadella announced the integration of GPT-5.5 Instant into M365 Copilot, Copilot Studio, and Foundry, highlighting faster and more accurate responses.
Sam Altman announces the release of GPT-Realtime-2 to the API, highlighting a significant advancement in voice interaction with AI for handling complex context.
Google released Multi Token Prediction drafters for Gemma 4 to accelerate inference via speculative decoding, but support for MLX is currently unconfirmed or unavailable.