large-context

#large-context

@huckiyang: https://x.com/huckiyang/status/2077625513384841679

X AI KOLs Timeline ↗ · 2026-07-16 Cached

Inkling, a 975B Mixture-of-Experts model (41B active) with 1M context and Apache-2.0 license, introduces a novel audio front end using a 7.9M parameter lookup table instead of a traditional encoder, achieving strong performance on speech tasks. The model was pretrained on 45 trillion tokens of text, images, audio, and video.

0 favorites 0 likes

#large-context

@EntelligenceAI: Kimi K3 is set to launch within hours And this could be one of the most important open-model launches of the year. Why …

X AI KOLs Following ↗ · 2026-07-14 Cached

Kimi K3, the latest open-weight model from Moonshot, launches with a new architecture, agent swarm capabilities, and focus on long-horizon agent workflows, positioning it as a major contender against other top models.

0 favorites 0 likes

#large-context

@NVIDIAAI: You're welcome

X AI KOLs Timeline ↗ · 2026-07-08 Cached

NVIDIA AI releases a 75B MoE model (9.3B active) compressed from Nemotron-3-Super-120B using the Iterative Puzzle framework, with 1M token context support.

0 favorites 0 likes

#large-context

@CDGalpha: gemini 3.5 flash is free now google quietly made their newest flash model free-tier eligible. no credit card, 1M contex…

X AI KOLs Timeline ↗ · 2026-07-03 Cached

Google quietly made its newest flash model, Gemini 3.5 Flash, free-tier eligible with no credit card, a 1M context window, and 1,500 requests per day, including native multimodal support.

0 favorites 0 likes

#large-context

Mimo 2.5 is _fast_ at large context (dual RTX Pro 6000)

Reddit r/LocalLLaMA ↗ · 2026-06-23

Mimo 2.5 demonstrates fast performance with large context windows using dual RTX Pro 6000 GPUs.

0 favorites 0 likes

#large-context

@Modular: .@zai_org open-sourced GLM 5.2 today, and Modular is a Day Zero launch partner. GLM 5.2 is their new flagship for codin…

X AI KOLs Following ↗ · 2026-06-16 Cached

Zhipu AI (zai_org) has open-sourced GLM 5.2, a flagship model for coding and long-horizon agentic tasks with a usable 1M-token context. Modular is a Day Zero launch partner, offering optimized serving on Modular Cloud.

0 favorites 0 likes

#large-context

Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt?

Reddit r/LocalLLaMA ↗ · 2026-06-10

The user reports that the Gemma 4 12B unified audio model stops attending to speech when the system prompt is large (~21k tokens), and asks for workarounds or explanations, noting the issue persists across vLLM, llama.cpp, and LiteRT-LM backends.

0 favorites 0 likes

#large-context

@mervenoyann: NVIDIA Nemotron Ultra is here > 55B/550B a hybrid MoE with 1M context window > supports MTP speculative decoding > da…

X AI KOLs Following ↗ · 2026-06-04 Cached

NVIDIA released Nemotron Ultra, a hybrid MoE model with 55B/550B parameters and a 1M context window, supporting MTP speculative decoding and available day-0 in transformers.

0 favorites 0 likes

#large-context

@PrajwalTomar_: Everyone's sleeping on MiniMax. Again. They just shipped M3. The first open-weights model to combine frontier coding, 1…

X AI KOLs Following ↗ · 2026-06-04 Cached

MiniMax released M3, an open-weights model combining frontier coding, 1M context, and native multimodality, offering comparable performance to Opus at a fraction of the cost.

0 favorites 0 likes

#large-context

@AdinaYakup: Step-3.7-Flash New VL model from @StepFun_ai 198B / 11B active - MoE 256K context 3 reasoning level Up to 400 tokens/sec

X AI KOLs Timeline ↗ · 2026-05-29 Cached

StepFun releases Step-3.7-Flash, a new large vision-language MoE model with 198B parameters (11B active), 256K context, and up to 400 tokens/sec inference speed.

0 favorites 0 likes

#large-context

Build Hour: GPT-Realtime-2

YouTube AI Channels ↗ · 2026-05-14 Cached

OpenAI released GPT Realtime-2 and two accompanying models during Build Hour, enhancing the intelligence and naturalness of voice interaction. It supports 128k context, parallel tool calls, and dynamic voice cloning, demonstrating production-grade applications such as voice-driven shopping assistants and analytics dashboards.

0 favorites 0 likes

large-context

Submit Feedback