Tag
User observes that the opus-4.8 model has degraded in performance since its launch.
Users report that OpenAI's Codex GPT-5.5 high model performance has degraded, exhibiting laziness, hallucinations, and context loss. Suspecting it's due to OpenAI training GPT-5.6, they need to enable xhigh mode to restore normal performance.
User reports that Qwen3.6 models running on llama.cpp server become significantly less capable after ~2 weeks of continuous operation, and restarting sessions does not resolve the issue.
A tool that tracks the ELO history of major AI models from the LMSYS Arena leaderboard, revealing hidden trends like performance degradation and upgrades over time.
A user running multiple agents reports that after upgrading to GPT-5.5, the model suddenly became less capable at executing tool calls and more prone to giving suggestions instead of acting, speculating OpenAI may be throttling for load management.
MiniMax published a technical blog post providing an in-depth analysis of the systematic vocabulary degradation issue behind its M2 series large models' inability to output specific personal names. It reveals parameter shifts caused by a disconnect in data coverage between pre-training and post-training stages, and proposes an effective solution involving full-scale synthetic data for remediation.
A user documents how closed models (GPT-4o→5.3, Gemini) degraded and censored Chinese novel translations, while local Gemma 4 31B now outperforms them with natural, uncensored output.