Models

MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

arXiv cs.CL ↗ · yesterday Cached

MemReranker is a reasoning-aware reranking model family (0.6B/4B) designed for agent memory retrieval, addressing limitations in semantic similarity by incorporating LLM knowledge distillation for better temporal and causal reasoning.

0 favorites 0 likes

@FinanceYF5: 1/ Voice Agent Upgraded: OpenAI Launches GPT-Realtime-2, Bringing GPT-5-Level Reasoning to Real-Time Voice API. Voice assistants no longer just "understand and respond" — they can think while listening, and problem-solve while chatting.

X AI KOLs Following ↗ · yesterday Cached

OpenAI has launched GPT-Realtime-2, integrating GPT-5-level reasoning into the real-time voice API, enabling voice assistants to think and solve problems in real time during conversations.

0 favorites 0 likes

@gdb: GPT-5.5-Cyber is now in limited preview for defenders for securing critical infrastructure. It's a very capable model.

X AI KOLs Following ↗ · yesterday Cached

GPT-5.5-Cyber is now in limited preview for defenders, offering a capable model for securing critical infrastructure.

0 favorites 0 likes

@seclink: OpenAI Launches GPT-Realtime-2, Its Most Intelligent Voice Model to Date. The model features GPT-5-level reasoning, a 128,000 token context window, and supports adjusting 'effort level' for more natural conversation. It can pair with GPT-R…

X AI KOLs Following ↗ · yesterday

OpenAI released the GPT-Realtime-2 voice model, featuring GPT-5-level reasoning capabilities and a 128,000 token context window. It supports real-time translation from over 70 input languages to 13 output languages, achieving 96.6% accuracy on the Big Bench Audio Intelligence benchmark. Greg Brockman called it a milestone in voice translation.

0 favorites 0 likes

@billtheinvestor: Shanghai Jiao Tong University open-sources F5-TTS speech generation model. The model is trained on 100,000 hours of data and supports bilingual synthesis in Chinese and English. Technical features include zero-shot voice cloning, total-duration-based speed control, emotion expression control, and long text synthesis. Commercial use is allowed.

X AI KOLs Timeline ↗ · yesterday Cached

Shanghai Jiao Tong University has open-sourced the F5-TTS speech generation model, trained on 100,000 hours of data, supporting bilingual synthesis in Chinese and English and zero-shot voice cloning, and allowing commercial use.

1 favorites 1 likes

Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-4bit

Reddit r/LocalLLaMA ↗ · yesterday

The user reviews a quantized and fine-tuned version of the Qwen3.6-35B model optimized for Apple Silicon via MLX, praising its speed, intelligence, and lack of safety disclaimers.

0 favorites 0 likes

@FinanceYF5: AI agents can now 'dream'—the Dreaming feature reviews historical conversations, extracts patterns, and continuously self-optimizes. Combined with multi-agent parallel orchestration and Outcome quality assessment, Claude agents officially enter the self-evolution phase.

X AI KOLs Following ↗ · yesterday Cached

Claude agents have added a new 'Dreaming' feature that enables self-optimization by reviewing historical conversations and extracting patterns. Together with multi-agent parallel orchestration and quality assessment, this marks the transition of AI agents into a self-evolution stage.

0 favorites 0 likes

JANGQ-AI/MiniMax-M2.7-JANGTQ_K : mixed-bit quant of MiniMax M2.7 - 74 GB on disk

Reddit r/LocalLLaMA ↗ · yesterday Cached

Release of a mixed-bit quantized version of the MiniMax M2.7 model, optimized to 74 GB for efficient local inference on Apple Silicon devices.

0 favorites 0 likes

ZAYA1-74B-Preview: Scaling Pretraining on AMD

Reddit r/LocalLLaMA ↗ · yesterday Cached

Zyphra releases ZAYA1-74B-Preview, a 74-billion parameter base model trained on AMD hardware, highlighting strong pre-RL reasoning capabilities and agentic performance signals.

0 favorites 0 likes

@kwindla: OpenAI shipped a new speech-to-speech model today: gpt-realtime-2 This is the first speech-to-speech model good enough …

X AI KOLs Following ↗ · yesterday

OpenAI has released gpt-realtime-2, a new speech-to-speech model optimized for real-time voice agent interactions with low-latency tool calling.

0 favorites 0 likes

@eglyman: we trained a .35b-parameter model to navigate spreadsheets better than opus 4.6. normal corporate card company stuff.

X AI KOLs Following ↗ · yesterday Cached

A developer trained a 350M-parameter model capable of navigating spreadsheets better than Anthropic's Opus 4.6.

0 favorites 0 likes

11.67% ARC-AGI-2 Local Eval on a Single 4090: The TOPAS Recursive Architecture

Reddit r/LocalLLaMA ↗ · yesterday

The authors present TOPAS, a recursive AI architecture achieving 11.67% on ARC-AGI-2 using a single RTX 4090, aiming to demonstrate that architectural efficiency can outweigh raw compute power.

0 favorites 0 likes

@googlegemma: Gemma 4 up to 3x faster, directly in your phone! Check out the difference Speculative Decoding makes! Multi-Token Predi…

X AI KOLs Timeline ↗ · yesterday Cached

Google's Gemma 4 achieves up to 3x faster inference speeds through speculative decoding and multi-token prediction, enabling efficient on-device deployment.

0 favorites 0 likes

@sama: people are really starting to use voice to interact with AI, especially when they have a lot of context to dump. GPT-Re…

X AI KOLs ↗ · yesterday Cached

Sam Altman announces the release of GPT-Realtime-2 to the API, highlighting a significant advancement in voice interaction with AI for handling complex context.

0 favorites 0 likes

New Gemma 4 MTP on MLX?

Reddit r/LocalLLaMA ↗ · yesterday

Google released Multi Token Prediction drafters for Gemma 4 to accelerate inference via speculative decoding, but support for MLX is currently unconfirmed or unavailable.

0 favorites 0 likes

I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]

Reddit r/MachineLearning ↗ · yesterday

Released en_legal_ner_ind_trf v0.1, an InLegalBERT model fine-tuned on 33,000 Indian Supreme Court judgments, achieving a 97.76% F1 score on case citations and significantly outperforming previous baselines.

0 favorites 0 likes

@_albertgu: Introducing a new sequence model Raven which pushes the boundary of fixed-state-size sequence models! Raven bridges pop…

X AI KOLs Timeline ↗ · yesterday

Researchers introduce Raven, a novel sequence model that merges state space model efficiency with a selective slot-updating mechanism inspired by sliding window attention to improve long-context retrieval. The approach offers a more principled alternative to existing linear-time models.

0 favorites 0 likes

@rshia_afz: 1/ SSMs struggle on recall benchmarks due to their fixed-size state. But are current models actually storing context “w…

X AI KOLs Timeline ↗ · yesterday

The article introduces Raven, a new State Space Model (SSM) with selective memory allocation that achieves state-of-the-art performance on recall tasks and demonstrates superior length generalization compared to existing models like SWA.

0 favorites 0 likes

@alex_whedon: Hey, folks! We have been blown away by the response to SubQ and the SSA breakthrough over the last 48 hours. It is awes…

X AI KOLs Following ↗ · yesterday

The creator of SubQ announces an overwhelming response to the SSA breakthrough, with plans to release a model card with additional data and third-party validation next week.

0 favorites 0 likes

AlphaEvolve: Gemini-powered coding agent scaling impact across fields

Hacker News Top ↗ · yesterday Cached

DeepMind highlights the expanded impact of AlphaEvolve, a Gemini-powered coding agent, demonstrating its ability to optimize algorithms for genomics, grid optimization, earth sciences, quantum physics, and mathematics.

0 favorites 0 likes

Models

Submit Feedback