Tag
ThinkBooster is a unified framework for test-time compute scaling of LLM reasoning, providing a modular Python library, a performance-efficiency benchmark, an OpenAI-compatible proxy service, and a visual debugger. Empirical results on math and coding tasks demonstrate practical gains with quality-cost trade-offs.
CIPER is a unified transformer framework that jointly performs city-scale retrieval and precise 3-DoF pose estimation from cross-view images, overcoming limitations of cascade pipelines.
OmniRetrieval is a framework that unifies retrieval across heterogeneous knowledge sources (text, tables, graphs) by dispatching native queries to appropriate execution engines, outperforming single-source baselines on a benchmark of 13 datasets and 309 knowledge bases.
FashionLens proposes a unified fashion image retrieval framework using multimodal large language models with adaptive calibration and sampling, achieving state-of-the-art performance across diverse retrieval scenarios.
Aurora is an agentic video editing framework that pairs a tool-augmented vision-language model agent with a diffusion transformer to automatically resolve textual and visual underspecification in user requests, enabling unified video editing tasks like replacement, removal, style transfer, and reference-driven insertion.
Skill1 is a unified framework that trains a single policy to co-evolve skill selection, utilization, and distillation using a shared task-outcome objective. Experiments on ALFWorld and WebShop show it outperforms existing baselines in complex task environments.
The article discusses the UniVidX paper, which introduces a unified multimodal framework for video generation using diffusion priors and discusses its cross-modal coherence mechanisms.
UniMesh introduces a single model that jointly handles 3D mesh generation and understanding via a Mesh Head, Chain-of-Mesh iterative editing, and a self-reflection error-correction mechanism.