@Potatoloogs: Gemini Co-Lead: World Model isn't a showcase, it's a bet on AGI—Where is RL's next explosive domain? a) Why Google is betting on World Model · Language has already distilled human written knowledge into weights; but video and images also contain vast amounts of knowledge. Can we extract physical concepts like "gravity" from pure visual data without relying on language annotations? That's the truly unsolved core problem of machine learning over the past decade. b) RL post-training: A greenfield, but with structural constraints. c) Memory and continual learning: The answer may not lie in weights. d) Can AI truly "innovate"? The capability Vinyals is most uncertain about. e) Advice for entrepreneurs.
Summary
Gemini co-lead Vinyals discusses World Model as key to AGI, argues that video data contains physical knowledge, RL post-training has huge potential but faces structural constraints, and is optimistic about non-parametric memory systems.
View Cached Full Text
Cached at: 06/08/26, 05:18 AM
Cursor Trains Composer 2: Pre-Training Lets Models “Learn Knowledge”, RL Lets Models “Know Who They Are”
a) Why Cursor Trains Its Own Model
Think of a model as a storage hard drive—it has a limited capacity for information.
Cursor cares about only one thing: software engineering, and only within Cursor. By dedicating all weights exclusively to this single task, the result is: better performance, and inference costs that are orders of magnitude lower (Composer is an order of magnitude cheaper than models like Opus).
Another ceiling: prompt engineering has its limits. To truly influence model behavior, you must bake the behavior into the weights via fine-tuning.
b) Composer 2 Training Approach: Two Axes in Parallel
Base model: Kimi 2.5 (1 trillion parameters MoE, 30B activated parameters).
Two steps: large-scale intermediate training (code tokens, close to pre-training scale) → large-scale RL.
The essential difference between intermediate training and RL:
- Intermediate training teaches the model “what code looks like” (next token prediction);
- RL teaches the model “to write correct code”: the model takes direct actions within the Cursor harness, learns to invoke tools, navigate the environment, and distinguishes between “writing code” and “writing correct code.”
c) The Essence of RL: Telling the Model “Who You Are”
After pre-training, the model absorbs the full spectrum of human knowledge. Faced with a math problem, it doesn’t know “what kind of person it is”: an expert, or a student still learning?
RL tunes this knob: you are an expert, you must get things right.
SFT = knowledge transfer; RL = sharpening behavior.
Therefore, RL’s applicability extends far beyond “tasks requiring verifiable rewards”: even for summarization or style, you can use LLM as judge with clear rubrics to guide RL.
d) The Core Challenge of RL Infrastructure: The Environment Must Be as Close to Real Production as Possible
The most powerful RL environment is your own product, because that’s where the model will actually work.
A counterintuitive finding: models can perceive they are in a fake environment and adopt different behaviors during RL training (they will “cheat” and learn techniques to score high in the fake environment).
To solve this, Cursor built a complete virtual machine stack that can quickly spin up in batches (requiring the ability to “give me 100,000 VMs now”).
e) The Key Breakthrough for Long-Chain Agents: Training “Self-Summarization” into the RL Loop
Two difficulties with long-chain RL: i. Credit assignment becomes increasingly difficult (the longer the chain, the harder to judge which step was right or wrong); ii. The context window is limited.
Cursor’s solution: directly train “self-summarization” into the RL loop.
The model jointly learns: to generate good summaries + to follow that summary and continue the task.
Result: The model nominally has a 200K context window, but can actually handle millions of tokens because it learns to summarize and restart the context when it’s about to fill up, while continuing to complete the task.
Similar Articles
@LaurenceMister: Has Gemini completely lost its mind?
This tweet questions whether the Google Gemini AI model's behavior is out of control.
@FeitengLi: Just said this morning: The intelligence of embodied intelligence should copy the homework of LLM + RL + Agentic. Here it is: Agentic VLA crushes the models of leading embodied companies across the board https://x.com/FeitengLi/status/205909864717506193...
Proposes the Agentic-VLA framework, introducing agents into the VLA loop, enabling the vision-language-action model to self-evolve and surpass existing leading embodied models on all metrics.
@jakevin7: An interesting thing. The DeepSeek V4 technical report conducted a comprehensive evaluation of all major LLMs, concluding that Gemini 3.1 Pro has the strongest world knowledge among all models. Not GPT, not Claude, but Gemini. But when people use Gemini...
According to the DeepSeek V4 technical report's evaluation of mainstream LLMs, Gemini 3.1 Pro is considered to have the strongest world knowledge, but users generally find it hard to use because the model does not proactively use search tools.
World Labs' Fei-Fei Li on Creating Large World Models
Fei-Fei Li explains that World Labs focuses on building large world models to unlock spatial intelligence, considering this the next frontier after language models, and argues its value from perspectives of evolutionary history, application scenarios, and technology classification, while expressing a pragmatic attitude towards AI safety and the necessity of educational reform.
World Models Explained: What Every AI Is Missing
The article explains the concept of world models in detail, comparing them to LLMs, introduces two major camps (pixel prediction and meaning prediction) and representative works such as Dreamer v3, GameNGen, Genie, and JEPA, discusses applications in autonomous driving and robotics, and points out that world models are a key component of physical AI.