Inside Google DeepMind: Reasoning, Omni, and Shipping Frontier AI

Reddit r/singularity News

Summary

This article summarizes a deep discussion among three Google DeepMind researchers on reasoning, multimodal generation (Omni), coding, and self-improvement, emphasizing that visual and dynamic thinking will surpass text-based chain-of-thought, and explores future trends in world models and synthetic training cases.

No content available
Original Article
View Cached Full Text

Cached at: 06/05/26, 09:11 AM

TL;DR: Three Google DeepMind researchers delve into reasoning, multimodal generation (Omni), coding and self-improvement, and the future evolution of thought processes, emphasizing that vision and dynamic thinking will surpass text-based chain-of-thought. ## Introduction: A Conversation Among Three Core DeepMind Members In a deep exchange at AGI House, three distinguished researchers from Google DeepMind shared unique insights on reasoning, multimodal models (Omni), coding, and frontier AI R&D. Drawing from personal experiences, they outlined key inflection points in current AI development. ## Guest Backgrounds: From the 1980s to Frontier Labs ### First Researcher: Crossing Google Brain, OpenAI, and DeepMind This researcher has focused on deep learning since 2015. They interned at Google Brain, moved to OpenAI, and eventually returned to DeepMind. Early work was highly technical (e.g., Gamma Softmax), aiming to improve sampling efficiency. About four years ago, they concluded that "digital AGI and physical AGI" were ready to take off, and have since dedicated themselves to frontier labs, going all-in on building the best models. ### Second Researcher, Jay: A Long-Termist in Multimodal Generation Jay has always gravitated toward vision. Since the rise of generative AI in the 2010s, he has worked deeply on image and multimodal generation, contributing to Imagine, Imagine Video, and eventually leading the Gemini Omni project. He sees a turning point: multimodal generative models are no longer just producing images or videos, but "generating content with intelligence." ### Third Researcher (Sergey): A Legend from Apple 2 to DeepMind Sergey recalls writing a simple backpropagation perceptron on an Apple 2 in the early 1980s, which was dismissed by many as "hopeless"—a perception that stalled the industry for a decade. He followed neural network progress for a long time, only moving from Google X to DeepMind a year and a half ago. He describes it as "unbelievable"—initially very poor at it, needing to learn fast, but now having the best job in the world. He marvels at how fast the world is changing; from a research perspective at DeepMind, the progress makes it nearly impossible to predict six months ahead. ## Coding & Reasoning: Self-Improvement and Frontier Use Cases ### Coding Is the Most Exciting Domain The first researcher identifies coding as the most exciting area right now. They argued four years ago that large models could be improved through self-improvement, using math as an example, but coding makes the entire process clearer. Coding is like writing down your thought process and letting it run, so it supports longer reasoning chains and can embed general knowledge. ### From Offline to Online Supervised Learning He emphasizes that the real breakthrough lies in combining verifiable reward learning with goal-driven self-improvement. The current shift from offline to online supervised learning is necessary to achieve 100% accuracy. A reasonable target is "going from a 94% model to a 95% product," requiring focus on the last few percentage points of improvement. ### Vibe Coding and Hierarchical Software Engineering Sergey shares his fascination with "Vibe Coding": running 20 Gemini instances simultaneously to solve problems he cares about. He believes this is not just about writing code, but about pushing the model to think algorithmically and make progress. From a macro perspective, the once-clear layers—code writing, software engineering, software architecture, UI design—are being re-experienced by models. Code writing is nearly complete (in the past few months, he's rarely found a code snippet from Gemini that he could do better). Software engineering (managing the complexity of 15 million lines of code) is still making clear progress. Architecture (involving actual functionality and hardware physical constraints) remains a clear frontier. UI design works well, and technologies like world models and Nano Banana provide great inspiration. ## World Models & Multimodality: Video Models as Reasoners ### From Omni to World Models Jay, drawing from his experience with Gemini Omni, stresses the importance of world models. If you have a good world model, it becomes easier to capture the problem you want to solve—for example, having the model generate a process for someone solving a complex problem or propose a proof for a math challenge. The intelligence of a world model is tightly linked to its world knowledge, understanding, and reasoning ability. In the future, powerful world models could simulate physics, potentially replacing experimental platforms in many natural sciences. ### Video Models Transcend Symbolic Thinking The first researcher argues that most information in the world is not only contained in symbols but also in spatial and temporal information. Half a year ago, they published a paper titled "Video Models as Your Thinking Reasoner," proposing that video models can access richer information in data. Humans do not think solely in text; although the industry has made great strides in text reasoning, it is still at an early stage in vision. Integrating vision into the model's reasoning process holds great promise. ## The Evolution of Thought Processes: Beyond Text Chain-of-Thought ### A Richer Vocabulary for Thinking Sergey agrees that thought processes will evolve significantly. Many people who write code do not think in English but in visual or dynamic terms. The commonly used "chain-of-thought" is great, but it can be improved—the vocabulary of thinking is far richer than we imagine. Code serves as an excellent benchmark for verifying reasoning correctness: models are mostly trained on GitHub data, much of which is low quality, yet the models work—that itself is surprising. ### The Importance of Synthetic Training Examples In the future, synthetic training examples will become crucial because they can surpass human coding capabilities. A simple example: take a piece of code, have the model explain it in English, give the English back to the model to write code, then compare functionality. This kind of approach forces the model not only to write code but also to understand it. This will be a major trend going forward. ## Conclusion & Thanks The conversation ended in a lively atmosphere. The three researchers agreed that the evolution of AI reasoning, multimodal generation, and coding has only just begun. Thanks to AGI House for providing the exchange platform that allowed cutting-edge ideas to collide. Source: Inside Google DeepMind: Reasoning, Omni, and Shipping Frontier AI (https://youtu.be/ZVYq7uNhRCk)

Similar Articles

Gemini Omni

Hacker News Top

Gemini Omni is a new AI model from Google DeepMind that combines reasoning with creative capabilities, enabling multimodal understanding, video editing, and content generation, with built-in safety measures and digital watermarking.

@FuSheng_0306: In an interview with Yao Shunyu, Google's internal strategy is indeed going all out to catch up. Google had been competing with OpenAI on chatbots, and fortunately, Gemini 3 performed well, increasing its market share. However, the rise of Anthropic made Sergey Brin realize that the decisive battle of large models lies in code-writing ability...

X AI KOLs Timeline

The article discusses Google's internal strategic adjustment in the face of competition from OpenAI and Anthropic. Google saw some success with Gemini 3, but realized the decisive battle of large models is in code-writing ability, reflecting the urgency of catching up.

@0xLogicrw: Google DeepMind researcher Lun Wang announces departure, and in a long post completely dismisses the current AI evaluation approach. The current evaluation systems are all 'fighting the last war' — they can only passively test capabilities the model already possesses, and have no way to predict what new abilities the next generation of models will suddenly evolve. Compared to data, …

X AI KOLs Timeline

Google DeepMind researcher Lun Wang leaves the company and writes a post criticizing the current AI evaluation system, arguing that it lags behind model evolution and cannot predict new capabilities, leaving the industry in a state of 'flying blind'.