@jakevin7: An interesting thing. The DeepSeek V4 technical report conducted a comprehensive evaluation of all major LLMs, concluding that Gemini 3.1 Pro has the strongest world knowledge among all models. Not GPT, not Claude, but Gemini. But when people use Gemini...
Summary
According to the DeepSeek V4 technical report's evaluation of mainstream LLMs, Gemini 3.1 Pro is considered to have the strongest world knowledge, but users generally find it hard to use because the model does not proactively use search tools.
View Cached Full Text
Cached at: 06/08/26, 05:14 AM
There’s an interesting observation.
In its technical report, DeepSeek V4 conducted a comprehensive evaluation of all major mainstream models, concluding that Gemini 3.1 Pro has the strongest world knowledge among all models.
Not GPT, not Claude — it’s Gemini.
Yet the general user experience with Gemini is: does it even work well?
The issue isn’t the model itself; it’s that Gemini is extremely lazy to take action.
If you ask it about the latest news, it has a search tool, but it just won’t use it proactively. Many times you have to explicitly say “go search for it” before it bothers to look. It’s like a well-read person — you ask what’s been happening lately, and they shrug: “I haven’t read today’s newspaper.”
A model with the best world knowledge in existence, yet too lazy to use its tools — that’s the real reason Gemini feels so awkward to use.
Similar Articles
@wquguru: https://x.com/wquguru/status/2057852569054278045
Performed source code analysis and multi-model testing on the pi-goal tool, finding that DeepSeek V4 Pro is 31x cheaper and higher quality than Gemini 3.5 Flash on long-horizon tasks, and that higher thinking mode actually increases hallucination.
The "One-Size-Fits-All" AI era is dead. I benchmarked GPT-5.5, Claude 4.7, Gemini 3.1 Pro, and DeepSeek V4 Pro here is the actual state of the frontier.
A benchmarking analysis of GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, and DeepSeek V4 Pro reveals that no single model dominates all tasks; optimal performance requires a multi-model router with specialized model usage based on strengths and weaknesses.
@RookieRicardoR: Domestic models break through again, matching top models like Claude 4.6 and Gemini 3.1 Pro. Just tested Qwen3.7-Max, sharing some real thoughts. Last night I topped up as soon as the API went live and chose three tasks (see video) to test Qwen3.7-Max's frontend capabilities…
The user tested Qwen3.7-Max and believes it matches top models like Claude 4.6 and Gemini 3.1 Pro in frontend, computing power, and Agent capabilities. Its reasoning ability has significantly improved, and with monthly iteration speed, it has become a first-tier domestic model.
Open source battle: GLM vs Kimi vs MiMo vs DeepSeek
This article tests four open-source Chinese AI models — Zhipu GLM 5.1, Moonshot Kimi K2.6, Stepfun MIMO 2.5 Pro, and DeepSeek V4 Pro — on programming tasks. It finds that GLM leads overall in most tasks but not absolutely; each model has its own strengths and weaknesses.
DeepSeek V4 Pro beats GPT-5.5 Pro on precision
DeepSeek V4 Pro reportedly outperforms GPT-5.5 Pro on precision, suggesting a significant advancement in model accuracy.