Tag
ABot-M0.5 is a new World Action Model for mobile manipulation that improves performance through temporal granularity alignment, action space disentanglement, and train-test consistency, achieving state-of-the-art results on long-horizon and fine-grained manipulation benchmarks.
Vera is a layered diffusion model for video editing that preserves source content by generating edit layers and alpha mattes, using a Mixture-of-Transformers architecture.
NVIDIA Cosmos 3 is an open omni-model for physical AI that unifies world generation, reasoning, and action generation into a single model, available on Hugging Face with various resources.
Cosmos 3 is a family of omnimodal world models from NVIDIA that jointly processes language, image, video, audio, and action sequences using a unified mixture-of-transformers architecture, achieving state-of-the-art performance in understanding and generation tasks for Physical AI.
EVA01 is a unified framework that integrates 3D mesh as a native modality into multimodal language models via a Mixture-of-Transformers architecture, enabling state-of-the-art text-to-3D generation and long-context multi-turn geometric editing.
Tencent releases HY-Embodied-0.5, a suite of foundation models designed for embodied AI agents featuring a Mixture-of-Transformers (MoT) architecture with efficient 2B and powerful 32B variants for real-world robot control and spatial-temporal reasoning.