Cached at:
05/15/26, 05:10 PM
TL;DR: Yann LeCun believes LLMs are not the path to human-level intelligence. He left Meta to found the AI company AMI, focusing on world models based on the Joint Embedding Predictive Architecture (JEPA) to understand the physical world and achieve planning capabilities.
## The Story Behind Leaving Meta
Yann LeCun led fundamental research at Meta (formerly FAIR) for many years, but the company environment gradually became unsuitable for his world model project. In early 2023, Meta entered the LLM space with Llama, developed by FAIR, and later formed the GenAI organization to productize it. However, Mark Zuckerberg was disappointed with progress and reorganized, pouring almost all resources into catching up in the LLM race. While Zuckerberg himself and CTO Doug Bosworth, among other executives, were interested in JEPA and world model projects, others at the company were entirely focused on LLMs and explicitly told LeCun that Meta was no longer the right place to push that project. By the end of 2024, as LeCun's team made key progress, he realized a transition from research to practical development was necessary, and most applications (e.g., industrial manufacturing) were not Meta's focus. So he left Meta to found AMI Labs.
## Skepticism About the LLM Paradigm
LeCun emphasizes: "There's nothing wrong with LLMs. They're great for what they do, but they are not the path to human-level or even animal-level intelligence." He believes current LLMs excel at processing human language (natural language, code, math, etc.), but the real world is far more complex — high-dimensional, continuous, noisy, and messy. Language models lack the ability to predict the consequences of their own actions and have no planning capability, because reasoning is just token-by-token prediction, not search and optimization.
## Two Key Features of a World Model
LeCun proposes that an intelligent system must have two core abilities:
1. **Predicting the consequences of its own actions**: This is fundamental for any intelligent system capable of acting. For example, pushing a water bottle: if you push the bottom, it slides; if you push the top, it might tip over. The prediction doesn't need to be pixel-perfect but should happen at an abstract representation level.
2. **Planning via search**: Instead of autoregressively predicting actions one by one, the system finds a series of actions that achieve a goal through search and optimization.
This is closely related to the non-generative architecture he advocates — the Joint Embedding Predictive Architecture (JEPA).
## The Origin of JEPA: From Autoencoders to Non-Generative Architectures
LeCun has long been interested in building world models through learning to predict. About five years ago, he realized that architectures that successfully learn image and video representations are all non-generative, while generative architectures (like VAEs, sparse autoencoders) largely fail. Denoising autoencoders (like MAE) were also disappointing. Around the same time, teams at FAIR Paris and New York found that joint embedding architectures worked better: take an image, corrupt it in some way, then run two encoders to predict the representation of the original image from the representation of the corrupted one. That is JEPA (Joint Embedding Predictive Architecture). From this, projects like DINO V1/V2/V3, VJEPA, and MIM-Encoder performed excellently in image and video representation learning.
## AMI: AI for the Real World
AMI stands for "Advanced Machine Intelligence", with the subtitle "AI for the Real World". The company's core goal is to build world models that can understand the physical world and have planning capabilities. LeCun points out that Vision-Language-Action models (VLAs) are widely considered not viable — they are not reliable enough and require too much data. World models, on the other hand, are systems that can predict the consequences of actions and support planning. AMI will scale JEPA architecture to learn from real-world videos, aiming to drive the transition from research to productization, with primary application areas including industrial manufacturing, etc.
## An Architecture Inspired by Cognitive Science
LeCun acknowledges that his vision of a world model is inspired by cognitive science, especially what psychologists call "System 2": deliberate, consequence-predicting planning for actions, as opposed to the instinctive "System 1". But more importantly, there is extensive empirical evidence: generative architectures (trying to predict at the pixel level) cannot effectively learn abstract representations, while non-generative JEPA can predict at the abstract representation level, which is more consistent with how the human brain actually works.
Source: https://www.youtube.com/watch?v=ngBraLDqzdI