Tag
PlatonicNav introduces a training-free framework for embodied navigation that uses vision-only semantic maps and blind matching to ground language goals, achieving generalization across tasks and embodiments without explicit cross-modal training.
AtlasVA is a teacher-free visual skill memory framework for vision-language model agents that uses spatial heatmaps, visual exemplars, and symbolic text skills to improve spatial decision-making in long-horizon tasks, outperforming baselines on several benchmarks.