captioning

#captioning

Video2LoRA: Parametric Video Internalization for Vision-Language Models

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper introduces Video2LoRA, a method that predicts Low-Rank Adaptation (LoRA) weights directly from video representations, enabling efficient video processing in frozen vision-language models. It reduces visual token load by up to 1500x and query TTFT by 6-80x while maintaining performance on video summarization and captioning benchmarks.

0 favorites 0 likes

captioning

Video2LoRA: Parametric Video Internalization for Vision-Language Models

Submit Feedback