Tag
Human Archive, a Silicon Valley startup, has raised $8.2 million to collect first-person video data from Indian gig workers to train robots for physical tasks. Despite rejections from major home services platforms, the company is partnering with other firms in the sector.
This paper investigates using vision-language models to assess nursing competency from egocentric video during simulation, finding that recognition accuracy inversely relates to competency level, suggesting a pedagogically informative signal.
Ego2World converts egocentric cooking videos (HD-EPIC) into executable symbolic worlds with graph-transition rules, enabling evaluation of belief-state planning under partial observation. Experiments show that belief memory improves task completion, suggesting it should be a first-class target in embodied agent evaluation.
PhysBrain 1.0 is a technical report presenting a method that uses human egocentric video to generate physical commonsense supervision for vision-language-action models, achieving state-of-the-art results on embodied control benchmarks including ERQA, PhysBench, SimplerEnv-WidowX, LIBERO, and RoboCasa.