Cached at:
06/05/26, 09:11 AM
TL;DR: Three Google DeepMind researchers delve into reasoning, multimodal generation (Omni), coding and self-improvement, and the future evolution of thought processes, emphasizing that vision and dynamic thinking will surpass text-based chain-of-thought.
## Introduction: A Conversation Among Three Core DeepMind Members
In a deep exchange at AGI House, three distinguished researchers from Google DeepMind shared unique insights on reasoning, multimodal models (Omni), coding, and frontier AI R&D. Drawing from personal experiences, they outlined key inflection points in current AI development.
## Guest Backgrounds: From the 1980s to Frontier Labs
### First Researcher: Crossing Google Brain, OpenAI, and DeepMind
This researcher has focused on deep learning since 2015. They interned at Google Brain, moved to OpenAI, and eventually returned to DeepMind. Early work was highly technical (e.g., Gamma Softmax), aiming to improve sampling efficiency. About four years ago, they concluded that "digital AGI and physical AGI" were ready to take off, and have since dedicated themselves to frontier labs, going all-in on building the best models.
### Second Researcher, Jay: A Long-Termist in Multimodal Generation
Jay has always gravitated toward vision. Since the rise of generative AI in the 2010s, he has worked deeply on image and multimodal generation, contributing to Imagine, Imagine Video, and eventually leading the Gemini Omni project. He sees a turning point: multimodal generative models are no longer just producing images or videos, but "generating content with intelligence."
### Third Researcher (Sergey): A Legend from Apple 2 to DeepMind
Sergey recalls writing a simple backpropagation perceptron on an Apple 2 in the early 1980s, which was dismissed by many as "hopeless"—a perception that stalled the industry for a decade. He followed neural network progress for a long time, only moving from Google X to DeepMind a year and a half ago. He describes it as "unbelievable"—initially very poor at it, needing to learn fast, but now having the best job in the world. He marvels at how fast the world is changing; from a research perspective at DeepMind, the progress makes it nearly impossible to predict six months ahead.
## Coding & Reasoning: Self-Improvement and Frontier Use Cases
### Coding Is the Most Exciting Domain
The first researcher identifies coding as the most exciting area right now. They argued four years ago that large models could be improved through self-improvement, using math as an example, but coding makes the entire process clearer. Coding is like writing down your thought process and letting it run, so it supports longer reasoning chains and can embed general knowledge.
### From Offline to Online Supervised Learning
He emphasizes that the real breakthrough lies in combining verifiable reward learning with goal-driven self-improvement. The current shift from offline to online supervised learning is necessary to achieve 100% accuracy. A reasonable target is "going from a 94% model to a 95% product," requiring focus on the last few percentage points of improvement.
### Vibe Coding and Hierarchical Software Engineering
Sergey shares his fascination with "Vibe Coding": running 20 Gemini instances simultaneously to solve problems he cares about. He believes this is not just about writing code, but about pushing the model to think algorithmically and make progress. From a macro perspective, the once-clear layers—code writing, software engineering, software architecture, UI design—are being re-experienced by models. Code writing is nearly complete (in the past few months, he's rarely found a code snippet from Gemini that he could do better). Software engineering (managing the complexity of 15 million lines of code) is still making clear progress. Architecture (involving actual functionality and hardware physical constraints) remains a clear frontier. UI design works well, and technologies like world models and Nano Banana provide great inspiration.
## World Models & Multimodality: Video Models as Reasoners
### From Omni to World Models
Jay, drawing from his experience with Gemini Omni, stresses the importance of world models. If you have a good world model, it becomes easier to capture the problem you want to solve—for example, having the model generate a process for someone solving a complex problem or propose a proof for a math challenge. The intelligence of a world model is tightly linked to its world knowledge, understanding, and reasoning ability. In the future, powerful world models could simulate physics, potentially replacing experimental platforms in many natural sciences.
### Video Models Transcend Symbolic Thinking
The first researcher argues that most information in the world is not only contained in symbols but also in spatial and temporal information. Half a year ago, they published a paper titled "Video Models as Your Thinking Reasoner," proposing that video models can access richer information in data. Humans do not think solely in text; although the industry has made great strides in text reasoning, it is still at an early stage in vision. Integrating vision into the model's reasoning process holds great promise.
## The Evolution of Thought Processes: Beyond Text Chain-of-Thought
### A Richer Vocabulary for Thinking
Sergey agrees that thought processes will evolve significantly. Many people who write code do not think in English but in visual or dynamic terms. The commonly used "chain-of-thought" is great, but it can be improved—the vocabulary of thinking is far richer than we imagine. Code serves as an excellent benchmark for verifying reasoning correctness: models are mostly trained on GitHub data, much of which is low quality, yet the models work—that itself is surprising.
### The Importance of Synthetic Training Examples
In the future, synthetic training examples will become crucial because they can surpass human coding capabilities. A simple example: take a piece of code, have the model explain it in English, give the English back to the model to write code, then compare functionality. This kind of approach forces the model not only to write code but also to understand it. This will be a major trend going forward.
## Conclusion & Thanks
The conversation ended in a lively atmosphere. The three researchers agreed that the evolution of AI reasoning, multimodal generation, and coding has only just begun. Thanks to AGI House for providing the exchange platform that allowed cutting-edge ideas to collide.
Source: Inside Google DeepMind: Reasoning, Omni, and Shipping Frontier AI (https://youtu.be/ZVYq7uNhRCk)