Tag
A summary of Oriol Vinyals' discussion on Google's Gemini models, world models, multimodal AI, agents, and challenges like continual learning and true innovation.
This paper proposes a method to certify the trust horizon of latent world models with known group symmetries by calibrating a raw error-propagation curve using split-conformal prediction and leveraging equivariance to transport certificates over the entire group orbit. The approach provides finite-sample guarantees and demonstrates non-vacuous certificates on symmetric 2D and 3D substrates.
This paper studies when conservation laws can be certified in learned latent world models, proposing bounded horizons that guarantee how long rollouts stay on physical invariant level sets using measurable model defects.
Microsoft's NextLat paper proposes a self-supervised training method where transformers predict their next hidden state instead of just the next token, leading to more compact world models, better planning and reasoning, and up to 3.3x faster generation.
Qwen-AgentWorld introduces language world models for agentic environments, covering seven domains with long chain-of-thought reasoning. The work includes a new benchmark, AgentWorldBench, and shows that world modeling improves downstream agent performance.
This paper introduces Causal-rCM, a unified teacher-forcing and self-forcing framework for autoregressive diffusion distillation in streaming video generation and interactive world models, achieving state-of-the-art performance with fast convergence.
Microsoft's NextLat introduces a training objective that rewards belief-state representations instead of relying solely on next-token prediction, pushing models toward compact world models for better generalization.
After 4.5 years at FAIR, a researcher joins AMI Labs to work on JEPA and World Models.
This paper investigates whether LLM agents can infer hidden world models through interaction, finding that they struggle to build stable internal models as complexity increases.
This paper introduces Reward as an Agent and DynDiff-GRPO to address reward hacking and limited exploration in reinforcement learning for embodied world models, achieving significant accuracy gains.
Professor Biwei Huang proposes a four-generation theory of AI paradigms, believing LLMs are just the first step, and the future lies in causal world models. Aether AI has completed a $20 million funding round, dedicated to building causal world models.
This paper argues that current world models lack a persistent state core, proposing a hybrid approach that adds temporal-causal structure via η-pseudo-unitary operator dynamics to convert pretrained GPT-2 into a time-reasoning model.
Lin Junyang, former head of Alibaba's Qianwen team, closed his AI lab's first financing round at a $2B post-money valuation, with Gao Rong and Sequoia China each investing $100M and Tencent adding $20M. The lab will focus on world models and embodied intelligence rather than general LLMs.
OdysseyML announces a $310M Series B funding round to advance world models, with backing from Natural Capital, Amazon, GV, AMD, and IQT.
Microsoft Research introduces Next-Latent Prediction (NextLat), a self-supervised method that trains transformers to predict their own next latent state, enabling compact world models for reasoning and planning and achieving up to 3.3x faster inference via self-speculative decoding.
Discusses the challenges facing embodied AI and robotics, including a 100,000-year data gap and lack of shared benchmarks, and highlights startup opportunities in data loops, eval systems, and deployment.
A research paper proposes agentic automata learning to evaluate whether LLM agents can infer hidden world models through interaction, finding that performance drops sharply as task complexity increases and that reasoning models outperform non-reasoning ones but still struggle.
This paper surveys evaluation methods for world models and argues for a decision-making-centric framework that prioritizes counterfactual reasoning, planning, and policy optimization over visual quality. It introduces an L0–L7 evaluation ladder and a benchmark protocol to align evaluation with claimed utility.
The author launches a weekly Video Model Journal Club covering video generation, world models, physical reasoning, diffusion, flow matching, etc. The first in-person talk will be by Yilun Du on Embodied Reasoning with World Models.
Kairos is a native world model framework for Physical AI that learns from diverse experiences using a cross-embodiment data curriculum, maintains persistent states with hybrid temporal attention, and supports efficient deployment on server and consumer hardware.