PM tried M3's 1M context on a real Q3 brief: where it held, where it broke

Reddit r/AI_Agents Models

Summary

A product manager shares hands-on testing of Minimax M3's 1M context window on a real Q3 strategic brief, noting strong source attribution up to ~200K tokens but synthesis degradation beyond that.

I'm a PM, not a researcher. My job is pulling 12-18 sources into one strategy doc and not losing the caveats. ChatGPT Pro has burned me twice by quietly dropping a paragraph of qualifiers. So when I saw Minimax M3's 1M context with MSA, I threw my actual Q3 brief at it. Notes from the trenches: 1. Setup: 14 sources (PDFs, earnings call transcripts, two analyst notes), around 340K tokens, asked for a synthesized strategy with the source map preserved. 2. Source attribution stayed clean across the full window. It could tell me "this claim came from the Gartner note vs. the competitor earnings call" without me re-prompting. Different category from my ChatGPT workflow. 3. What broke: The synthesis got confident past roughly 200K. Below that, caveats stuck. Above that, the model started reconciling contradictions instead of flagging them. Exactly the failure mode that has bitten me before. I caught it only because I had the source map open side by side. I wonder, is this consistent with what others see on long-context synthesis tasks? The M3 brief claims BrowseComp 83.5 and a 12-hour ICLR replication with 18 commits and 23 figures, both clearly different workloads. Curious whether 'MSA' has known behavior at the upper end of the window, or whether my prompt is the bottleneck.
Original Article

Similar Articles

MiniMax M3 (2 minute read)

TLDR AI

MiniMax introduces M3, the first open-weights model to combine coding, agentic, and multimodal capabilities with up to 1M context via sparse attention.

MiniMaxAI/MiniMax-M3

Hugging Face Models Trending

MiniMax releases M3, a native multimodal model with 1M context and ~428B parameters, using MiniMax Sparse Attention (MSA) for efficient long-context processing, achieving frontier-level coding and agentic performance.