Tag
OmniVideo-100K introduces an automated data engine with entity-anchored scripting and clue-guided QA generation to improve audio-visual reasoning and temporal consistency, achieving significant performance gains across multiple benchmarks.
Introduces A2RBench, an automated pipeline for generating formally verifiable abstract reasoning benchmarks for LLMs, using cycle consistency to ensure unique solutions, and reveals that current LLMs underperform humans significantly on 3D reasoning tasks.
Kevin Lin, a postdoctoral fellow at Oxford University, open-sourced Violin, a video translation tool that integrates speech recognition, LLM translation, and speech synthesis into an automated pipeline. It supports multilingual translation and personalized styles, and provides three usage modes: Web, CLI, and Agent.