large-language-model

#large-language-model

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

Reddit r/MachineLearning ↗ · 5h ago

DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.

0 favorites 0 likes

#large-language-model

@cyrilXBT: CHINA JUST BUILT AN AI MODEL THAT IS COMPETING WITH OPENAI AND ANTHROPIC AT A FRACTION OF THE COST. And someone just dr…

X AI KOLs Timeline ↗ · 10h ago

DeepSeek, a Chinese AI model built by a quant hedge fund, is reportedly competing with GPT-4 level performance at roughly 5% of the training cost, causing significant market disruption including a $600B drop in NVIDIA's market cap. A free 1 hour 50 minute course has been released teaching users how to leverage DeepSeek V4 locally and via API.

0 favorites 0 likes

#large-language-model

@Teknium: We just hit number one globally across all AI apps on OpenRouter. Super grateful to the nearly 1000 contributors who've…

X AI KOLs Following ↗ · 12h ago Cached

The Hermes Agent model has reached the top global ranking across all AI applications on OpenRouter, powered by contributions from nearly 1,000 developers. The creator thanks the community and invites suggestions for future improvements.

0 favorites 0 likes

#large-language-model

Ring 2.6 1T

Reddit r/LocalLLaMA ↗ · 21h ago

Ring 2.6 1T, a 1-trillion parameter model with open weights, has been listed on Open Router for free use, with expectations of a full public release.

0 favorites 0 likes

#large-language-model

ZAYA1-74B-Preview: Scaling Pretraining on AMD

Reddit r/LocalLLaMA ↗ · yesterday Cached

Zyphra releases ZAYA1-74B-Preview, a 74-billion parameter base model trained on AMD hardware, highlighting strong pre-RL reasoning capabilities and agentic performance signals.

0 favorites 0 likes

#large-language-model

unsloth/Qwen3.6-27B-GGUF

Hugging Face Models Trending ↗ · 2026-04-22 Cached

Unsloth releases a GGUF quantized version of the Qwen3.6-27B model, featuring improved agentic coding capabilities, tool calling, and support for Unsloth Studio.

0 favorites 0 likes

#large-language-model

Qwen/Qwen3.6-27B

Hugging Face Models Trending ↗ · 2026-04-21 Cached

Qwen releases the open-weight Qwen3.6-27B model on Hugging Face, featuring improved stability, agentic coding capabilities, and thinking preservation for better developer productivity.

0 favorites 0 likes

#large-language-model

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving

Hacker News Top ↗ · 2026-04-20

Alibaba releases Qwen3.6-Max-Preview, an updated version of its Qwen3.6 model series with improved performance and capabilities.

0 favorites 0 likes

#large-language-model

@stevibe: MiniMax M2.7 is 230B params. Can you actually run it at home? I tested Unsloth's UD-IQ3_XXS (80GB) on 4 different rigs:…

X AI KOLs Following ↗ · 2026-04-18 Cached

A user tested MiniMax M2.7 (230B parameter model) using Unsloth's UD-IQ3_XXS quantization (80GB) across four different hardware configurations including RTX 4090, RTX 5090, RTX PRO 6000, and DGX setups, reporting token generation speeds and time-to-first-token metrics.

0 favorites 0 likes

#large-language-model

So... has anyone actually figured out whose model Elephant Alpha is yet?

Reddit r/singularity ↗ · 2026-04-18

Community discusses the identity of 'Elephant Alpha', a 100B parameter model ranked #1 on OpenRouter with 256K context window, fast inference speed, and strong coding capabilities but poor Chinese support, speculating on which company might be behind it.

0 favorites 0 likes

#large-language-model

MiniMaxAI/MiniMax-M2.7

Hugging Face Models Trending ↗ · 2026-04-09 Cached

MiniMaxAI releases MiniMax-M2.7, an open-weight model featuring self-evolution capabilities, advanced agent team support, and strong performance on software engineering benchmarks (56.22% on SWE-Pro, 66.6% medal rate on MLE Bench Lite), with notable applications in production incident recovery and professional work tasks.

0 favorites 0 likes

#large-language-model

Gemma 4: Byte for byte, the most capable open models

Google DeepMind Blog ↗ · 2026-04-02 Cached

Google DeepMind introduces Gemma 4, its most capable family of open models to date, designed for advanced reasoning and agentic workflows with high intelligence-per-parameter efficiency across multiple sizes.

0 favorites 0 likes

#large-language-model

mistralai/Mistral-Medium-3.5-128B

Hugging Face Models Trending ↗ · 2026-03-31 Cached

Mistral AI has released Mistral Medium 3.5, a dense 128B multimodal model featuring a 256k context window, configurable reasoning capabilities, and improved performance in instruction following, reasoning, and coding tasks.

0 favorites 0 likes

#large-language-model

google/gemma-4-26B-A4B-it

Hugging Face Models Trending ↗ · 2026-03-11 Cached

Google DeepMind releases Gemma 4, a family of open-weight multimodal models ranging from 2.3B to 31B parameters with support for text, image, video, and audio inputs. The models feature 256K context windows, MoE and dense architectures, enhanced reasoning capabilities, and are optimized for deployment across devices from mobile to servers.

0 favorites 0 likes

#large-language-model

Introducing GPT-5.4

OpenAI Blog ↗ · 2026-03-05 Cached

OpenAI is releasing GPT-5.4 and GPT-5.4 Pro across ChatGPT, the API, and Codex, featuring native computer-use capabilities, 1M token context, improved reasoning and coding, and state-of-the-art performance on professional knowledge work benchmarks. It is described as OpenAI's most capable and token-efficient reasoning model to date.

0 favorites 0 likes

#large-language-model

Gemini 3 Flash: frontier intelligence built for speed

Google DeepMind Blog ↗ · 2025-12-17 Cached

Google has released Gemini 3 Flash, a fast, cost-effective AI model that combines Pro-grade reasoning with Flash-level speed for tasks like coding, complex analysis, and agentic workflows.

0 favorites 0 likes

#large-language-model

Introducing GPT-5.2

OpenAI Blog ↗ · 2025-12-11 Cached

OpenAI introduces GPT-5.2, the most capable model series yet, with significant improvements in knowledge work, code generation, image perception, long-context understanding, and tool-calling. The GPT-5.2 Thinking variant achieves state-of-the-art performance on professional benchmarks, outperforming human experts on 70.9% of GDPval tasks across 44 occupations.

0 favorites 0 likes

#large-language-model

Introducing GPT-5.1 for developers

OpenAI Blog ↗ · 2025-11-13 Cached

OpenAI releases GPT-5.1, a new model in the GPT-5 series that dynamically adapts thinking time based on task complexity, offering 2-3x faster performance than GPT-5 while maintaining frontier intelligence. The release includes extended prompt caching (24-hour retention), new coding tools (apply_patch and shell), and a 'no reasoning' mode for latency-sensitive applications.

0 favorites 0 likes

#large-language-model

GPT-5.1: A smarter, more conversational ChatGPT

OpenAI Blog ↗ · 2025-11-12 Cached

OpenAI releases GPT-5.1 Instant and GPT-5.1 Thinking, upgraded versions of the GPT-5 series with improved conversational abilities, better instruction following, adaptive reasoning, and enhanced tone controls. The models are rolling out to ChatGPT users starting with paid subscribers, with API availability coming later this week.

0 favorites 0 likes

#large-language-model

GPT-5 and the new era of work

OpenAI Blog ↗ · 2025-08-07 Cached

OpenAI announces GPT-5, their most advanced model yet, unifying capabilities from GPT-4o, o-series reasoning, agents, and advanced math, with immediate rollout to Team users and API access for developers. The release marks a major milestone with 700 million weekly ChatGPT users and 5 million paid business users already leveraging OpenAI's technology.

0 favorites 0 likes

large-language-model

Submit Feedback