Researchers introduce T-Rex, a framework that unifies vision, language, and tactile sensing so robots can respond to physical contact in real time rather than relying on vision alone
Summary
Researchers introduced T-Rex, a framework that integrates vision, language, and tactile sensing, enabling robots to respond to physical contact in real time rather than relying solely on vision.
Similar Articles
'Touch dreaming' helps humanoid robots handle five tricky tasks with 90.9% higher success
Researchers from CMU and Bosch Center for AI introduced the Humanoid Transformer with Touch Dreaming (HTD) model, which uses tactile signal prediction to improve humanoid robot manipulation, achieving a 90.9% higher average success rate over the ACT baseline across five real-world tasks.
@rohanpaul_ai: Language had a strange advantage robotics does not: Text is already a compressed, shared interface for human thought, w…
Discusses the challenges facing embodied AI and robotics, including a 100,000-year data gap and lack of shared benchmarks, and highlights startup opportunities in data loops, eval systems, and deployment.
DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation
DeVI introduces a framework that turns text-conditioned synthetic videos into physically plausible dexterous robot control via a hybrid 3D-2D tracking reward, enabling zero-shot generalization to unseen objects.
RLDX-1 Technical Report
RLDX-1 is a general-purpose robotic policy for dexterous manipulation that uses a Multi-Stream Action Transformer architecture to integrate heterogeneous modalities, outperforming existing VLA models in real-world tasks.
Robots Need More than VLA and World Models
This position paper argues that advancing robot intelligence requires integrating unstructured behavioral data through specialized interfaces for labeling, embodiment mapping, world modeling, and reward inference, rather than relying solely on scaling Vision-Language-Action (VLA) models and world models.