Tag
ACE-EGO-0 is a unified Vision-Language-Action pretraining framework that leverages egocentric human videos and robot trajectories via a reliability-aware training objective, achieving state-of-the-art on embodied AI benchmarks.
This paper proposes CL-DMDF, a dynamic multimodal data fusion model that uses contrastive learning and a dual-dimensional attention mechanism to handle missing modalities and improve discriminative learning.