MemReranker is a reasoning-aware reranking model family (0.6B/4B) designed for agent memory retrieval, addressing limitations in semantic similarity by incorporating LLM knowledge distillation for better temporal and causal reasoning.
OpenAI has launched GPT-Realtime-2, integrating GPT-5-level reasoning into the real-time voice API, enabling voice assistants to think and solve problems in real time during conversations.
GPT-5.5-Cyber is now in limited preview for defenders, offering a capable model for securing critical infrastructure.
OpenAI released the GPT-Realtime-2 voice model, featuring GPT-5-level reasoning capabilities and a 128,000 token context window. It supports real-time translation from over 70 input languages to 13 output languages, achieving 96.6% accuracy on the Big Bench Audio Intelligence benchmark. Greg Brockman called it a milestone in voice translation.
Shanghai Jiao Tong University has open-sourced the F5-TTS speech generation model, trained on 100,000 hours of data, supporting bilingual synthesis in Chinese and English and zero-shot voice cloning, and allowing commercial use.
The user reviews a quantized and fine-tuned version of the Qwen3.6-35B model optimized for Apple Silicon via MLX, praising its speed, intelligence, and lack of safety disclaimers.
Claude agents have added a new 'Dreaming' feature that enables self-optimization by reviewing historical conversations and extracting patterns. Together with multi-agent parallel orchestration and quality assessment, this marks the transition of AI agents into a self-evolution stage.
Release of a mixed-bit quantized version of the MiniMax M2.7 model, optimized to 74 GB for efficient local inference on Apple Silicon devices.
Zyphra releases ZAYA1-74B-Preview, a 74-billion parameter base model trained on AMD hardware, highlighting strong pre-RL reasoning capabilities and agentic performance signals.
OpenAI has released gpt-realtime-2, a new speech-to-speech model optimized for real-time voice agent interactions with low-latency tool calling.
A developer trained a 350M-parameter model capable of navigating spreadsheets better than Anthropic's Opus 4.6.
The authors present TOPAS, a recursive AI architecture achieving 11.67% on ARC-AGI-2 using a single RTX 4090, aiming to demonstrate that architectural efficiency can outweigh raw compute power.
Google's Gemma 4 achieves up to 3x faster inference speeds through speculative decoding and multi-token prediction, enabling efficient on-device deployment.
Sam Altman announces the release of GPT-Realtime-2 to the API, highlighting a significant advancement in voice interaction with AI for handling complex context.
Google released Multi Token Prediction drafters for Gemma 4 to accelerate inference via speculative decoding, but support for MLX is currently unconfirmed or unavailable.
Released en_legal_ner_ind_trf v0.1, an InLegalBERT model fine-tuned on 33,000 Indian Supreme Court judgments, achieving a 97.76% F1 score on case citations and significantly outperforming previous baselines.
Researchers introduce Raven, a novel sequence model that merges state space model efficiency with a selective slot-updating mechanism inspired by sliding window attention to improve long-context retrieval. The approach offers a more principled alternative to existing linear-time models.
The article introduces Raven, a new State Space Model (SSM) with selective memory allocation that achieves state-of-the-art performance on recall tasks and demonstrates superior length generalization compared to existing models like SWA.
The creator of SubQ announces an overwhelming response to the SSA breakthrough, with plans to release a model card with additional data and third-party validation next week.
DeepMind highlights the expanded impact of AlphaEvolve, a Gemini-powered coding agent, demonstrating its ability to optimize algorithms for genomics, grid optimization, earth sciences, quantum physics, and mathematics.