Tag
According to the DeepSeek V4 technical report's evaluation of mainstream LLMs, Gemini 3.1 Pro is considered to have the strongest world knowledge, but users generally find it hard to use because the model does not proactively use search tools.
A new paper introduces an outcome-based reward that quantifies how self-generated world knowledge boosts task success, enabling agents to improve without external guidance at inference.
This paper proposes a method to train LLM agents with intrinsic meta-evolution capabilities, enabling spontaneous self-improvement without external rewards at inference time. Applied to Qwen3-30B and Seed-OSS-36B, the approach yields a 20% performance boost on web navigation benchmarks, with a 14B model outperforming Gemini-2.5-Flash.