Early test and leaks show disappointing result of 3.5 pro
Summary
Early tests and leaked information indicate that the 3.5 Pro model has delivered disappointing results, falling short of expectations.
Similar Articles
@sailfishcc1: After hundreds of 5.4 Pro queries, it’s obvious this is 5.5-thinking-xhigh, not 5.5 Pro—proof OpenAI trusts its 5.5 reasoning tier more than 5.4 Pro.
User testing shows the new 5.4 Pro behaves like a stealth 5.5-thinking-xhigh, hinting OpenAI is quietly giving Pro subscribers early access to stronger reasoning.
Gemini 3.5 Flash looks worse than it seems on Artificial Analysis
Comparison showing that Gemini 3.5 Flash scores slightly lower than Gemini 3.1 Pro in Artificial Analysis benchmarks and has a higher total benchmark cost despite lower per-token API pricing.
Gemini 3.5 Flash Benchmarks
Benchmark results for the Gemini 3.5 Flash model are discussed, likely showcasing its performance across various AI tasks.
Gemini 3.5 flash is not that great at coding
The article discusses evaluation results from Cursor suggesting that Gemini 3.5 Flash underperforms in coding tasks compared to expectations.
Gemini 3.5 Flash improves over Gemini 3.1 Pro on the Short Story Creative Writing Benchmark: -2.3 → -1.8.
Gemini 3.5 Flash outperforms Gemini 3.1 Pro on a short story creative writing benchmark, improving from -2.3 to -1.8 in head-to-head comparisons.