Do you guys think subquadratic actually has a 12 million context model

Reddit r/ArtificialInteligence News

Summary

Sub Quadratic claims to have a model with a context of 12 million tokens, but access is limited to partners; it performs well in the "needle in a haystack" test, but lacks evidence of general reasoning ability, raising doubts.

No content available
Original Article
View Cached Full Text

Cached at: 06/18/26, 03:31 AM

### TL;DR An AI company named Sub Quadratic claims to have released a model with a 12-million-token context window, but the actual product is limited to partner access. The "needle-in-a-haystack" retrieval task performs well, but this is far from the model's capability in real-world reasoning tasks. The currently claimed production-ready version has not been made public. ## What does 12 million tokens mean? If this claim is true, it would be the biggest leap in the long-context AI field. An average book contains about 100,000 tokens; GPT-5.4 at the million-token level can hold roughly 10 books. But 12 million tokens could fit about **120 books** — essentially the data volume of a small library, or the entire codebase of a company. At such a scale, context compression becomes irrelevant. ## Suspicion: Same claim again, and no one can use it This is not the first time Sub Quadratic has floated a 12-million-token context. They made a similar claim a few months ago, but access was restricted and evidence was lacking, sparking widespread doubt. Now, Sub Q 1.1 returns with the same 12-million-token context window, but the only publicly available model is the SubQ 1 Million Preview. Larger versions (from 2 million to 12 million tokens) are limited to a few early partners, with a future release planned. This is far from a "production-ready 12-million-token beast." ## Core technology: Sparse Structured Attention (SSA) The main bottleneck of traditional language models is that the computation of the attention mechanism grows **quadratically** with context length. While DeepSeek's sparse attention can reduce computation, it is still essentially quadratic. Linear attention has been proposed but suffers from poor quality. Sub Quadratic claims that its SSA method achieves **linear computational cost** while maintaining attention quality comparable to regular attention. Their paper states that at a 12-million-token context, efficiency is **191 times** higher than DeepSeek's sparse attention. ## Third-party evaluation: Appen report Sub Quadratic is not entirely without evidence. They have published a technical paper and methodology, and a third-party evaluator, **Appen**, conducted long-context retrieval tests. The Appen report says that in the **"needle-in-a-haystack"** test, the model showed **near-perfect retrieval capability** across 12 million tokens. This supports their narrower claim that the model can "retrieve implanted facts from 12 million tokens." ## Key issue: Retrieval ≠ reasoning The biggest caveat is that "needle-in-a-haystack" is one of the simplest and easiest long-context tasks. Extracting a single isolated fact is very different from performing reasoning over 12 million tokens. Even current closed-source models (like GPT-5.5 and Claude-4.8) show significant quality degradation near one million tokens, let alone 12 million. Sub Quadratic provides no evidence that their model can handle general reasoning, coding, or real-world large-scale tasks at a 12-million-token context. ## Two conflated claims Sub Quadratic mixes two different claims in their public materials: - **Claim one**: The model can retrieve implanted facts within a 12-million-token context. This claim seems plausible and is supported by third-party evidence. - **Claim two**: They have a production-ready 12-million-token context model capable of general reasoning, coding, and handling huge real-world problems. This claim has **almost no supporting evidence**. ## Final verdict Sub Quadratic is worth watching, but the claims are overblown. It is quite likely that they have a model that can pass a 12-million-token "needle-in-a-haystack" test, but it is extremely unlikely that the same model can reason accurately over 12 million tokens. True verification will have to wait until independent users can test at scale or other companies can reproduce their results. --- Source: FusionCow - Do you guys think subquadratic actually has a 12 million context model (https://www.youtube.com/watch?v=qaPdHmkGDgo)

Similar Articles

Subquadratic AI introduces SubQ-1.1-Small, a new model using Smart Sparse Attention

Reddit r/singularity

Subquadratic AI introduces SubQ-1.1-Small, a model leveraging Smart Sparse Attention to achieve near-perfect long-context retrieval up to 12M tokens with up to 1,000x attention compute reduction. It balances long-context optimization with strong general reasoning, outperforming baselines on benchmarks like NIAH and RULER.

@sanbuphy: K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times…

X AI KOLs Timeline

K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac, using the niche Zig language to implement and optimize inference, demonstrating the new model’s generalization ability. After 4,000+ tool calls and 12+ hours of continuous operation, K2.6 iterated 14 times, boosting throughput from ~15 tokens/s to ~193 tokens/s, ultimately achieving 20% faster inference than LM Studio.

Deepseek V4's 1M context window: the breaking point

Reddit r/LocalLLaMA

A detailed evaluation of Deepseek V4's 1M token context window across production codebases reveals optimal performance at 150-250k tokens, with degradation past 300k and significant latency in reasoning mode. The model exhibits high hallucination rates on unknown tasks, requiring validation layers for production use.