Tag
This paper introduces Video2LoRA, a method that predicts Low-Rank Adaptation (LoRA) weights directly from video representations, enabling efficient video processing in frozen vision-language models. It reduces visual token load by up to 1500x and query TTFT by 6-80x while maintaining performance on video summarization and captioning benchmarks.
FaceFusion is an open-source face fusion/Deepfake platform with 28.5k stars, supporting local high-precision face swapping, lip-syncing, as well as image, video, and batch processing, providing a complete job management system.
Built an agent skill that extracts slides from YouTube videos and writes notes, images, transcripts, and slides into Obsidian vaults, with an HTML artifact for navigation.
Swift Sampling is a training-free algorithm that uses Taylor expansion to identify high-information moments in long-form videos by detecting deviations from predicted feature trajectories, improving accuracy on video QA tasks with minimal computational overhead.
video-utils is a video processing utility hosted on Replicate, with over 18 million runs and available for use via playground and API.