Cached at:
06/24/26, 02:23 AM
TL;DR: Google DeepMind researcher Nanad Tamashv explores the core differences between AI agents and large language models, their current capabilities and limitations, and the potential emergence of a new economy when millions of agents trade and collaborate with each other, as well as a new path toward AGI.
## Agents vs. Large Language Models: Conceptual and Experiential Differences
Agents are not a new concept. Long before the emergence of large language models, the AI research community had been exploring them—for example, agents that act in 3D simulated environments, collect objects, and emphasize "embodied intelligence." Now, the main difference between language models and agents is that agents observe the state of the world and perform actions, while language models only produce continuations in response to prompts or queries. Although modern agents still use large language models as their backbone, the key is that we build a framework around the model to automatically execute a chain of actions, giving the system more autonomy.
For users, the interface to interact with an agent still resembles a language model conversation, but the user's role shifts from being a questioner to a decision-maker and approver. For example, when planning a wedding, an LLM will only list a set of caterers—but you need to send the emails yourself. An agent, on the other hand, can directly access your Gmail, draft, and send the emails (of course, requiring your approval). This way, you can step away and do other things while waiting for the agent to complete the task.
## Current Capabilities and Limitations: Coding is a Strength, But 100% Accuracy Is Not Achievable
Today, the strongest use case for agents is in coding, because many formal processes can be translated into code. Coding tools (including Google's internal tools and external public tools) have significantly accelerated software development, shifting human effort toward creativity and design. But even in coding, agents still cannot guarantee 100% correctness. Every action has a failure rate; the more complex the action, the higher the expected failure rate. There is a long-standing risk of "automation bias": after an agent correctly executes several steps in a row, users may let their guard down and stop verifying, leading to major errors. Therefore, keeping humans alert and in the loop is not only a design intention but also a safety baseline.
## Long-term Vision: New Economy, Scientific Breakthroughs, and Shifting Human Roles
When millions of AI agents not only work for humans but also trade, negotiate, and delegate tasks among themselves, this could give rise to a "new economy" and become one path toward AGI. But more importantly, it could drive scientific progress: agents in autonomous labs can schedule and run physical experiments, dramatically accelerating discovery. However, the closed loop in science is harder than in software—software can be automatically verified by writing tests, but science requires feedback from physical experiments, and there are risks of hardware damage and safety hazards (e.g., battery designs overheating). Hence reliable safety protocols are needed.
Current-generation systems excel at "compositional closure": trained on human data, they can replicate, combine existing skills, and bridge small gaps. But they have not yet made transformative scientific discoveries that humans wouldn't think of. Until AGI arrives, there is still huge room for human contribution.
## Why Did Agents Take So Long to Materialize?
Systems historically called agents (e.g., trading algorithms that optimize data center operations) have been deployed for a long time, but they lacked language capability, could not interact with humans, and were narrow agents designed for single tasks. The breakthrough now is that agents are based on language models—humans can talk to them, guide them, and learn from them. But the barriers to widespread adoption come from two sides: first, the models themselves still occasionally hallucinate, and any hallucination can lead to catastrophic consequences; second, we need better coordination, orchestration, and management capabilities for agent teams—similar to managing human teams, but agents make non-human errors.
Trust needs to be earned through long-term reputation tracking. Mechanisms should be built into the framework: do not trust an agent if it repeatedly proves unreliable; even if reliable, do not trust blindly. Acknowledging the existence of hallucinations and integrating them into the workflow is a realistic attitude.
## Future-Oriented Management Skills and Safety Frameworks
As agent capabilities increase, we need to cultivate personal management skills to handle these workflows: manage agent teams like managing human teams, but understand their unique error patterns. At the same time, safety guardrails must be established for high-risk scenarios like science, e.g., action approval, real-time verification, and reputation-based trust models. Ultimately, AI is entering domains like mathematics and science where it has never been before; the transformation is rapid, and the window for human adjustment is shorter than during the Industrial Revolution. Therefore, every step must be taken with caution.
Source: https://www.youtube.com/watch?v=V04bm-3d6EQ