LiconStudio/Ltx2.3-VBVR-lora-I2V
Summary
LiconStudio releases a LoRA adapter for LTX-2.3 fine-tuned on the VBVR dataset to enhance video generation with improved prompt understanding, motion dynamics, and temporal consistency for complex video reasoning tasks.
View Cached Full Text
Cached at: 04/20/26, 02:45 PM
LiconStudio/Ltx2.3-VBVR-lora-I2V · Hugging Face
Source: https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#ltx-2-vbvr-lora—video-reasoningLTX-2 VBVR LoRA - Video Reasoning
LoRA fine-tuned weights for LTX-2.3 22B on the VBVR (A Very Big Video Reasoning Suite) dataset.
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#training-dataTraining Data
To ensure training quality, we preprocessed the full 1,000,000 videos from the official dataset and randomly sample during training to maintain data diversity. We adopt the official parameters with batch_size=16 and rank=32 to prevent catastrophic forgetting caused by excessively large rank.
The VBVR dataset contains 200 reasoning task categories, with ~5,000 variants per task, totaling ~1M videos. Main task types include:
- Object Trajectory: Objects moving to target positions
- Physical Reasoning: Rolling balls, collisions, gravity
- Causal Relationships: Conditional triggers, chain reactions
- Spatial Relationships: Relative positions, path planning
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#model-detailsModel Details
ItemDetailsBase Modelltx-2.3-22b-devTraining MethodLoRA Fine-tuningLoRA Rank32Effective Batch Size16Mixed PrecisionBF16
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#todo-listTODO List
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#dataset-release-planDataset Release Plan
DatasetVideosStatusVBVR-96K96,000✅ ReleasedVBVR-240K240,000🔄 ProcessingVBVR-480K480,000📋 Planned
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#lora-capabilitiesLoRA Capabilities
This LoRA adapter enhances the base LTX-2 model for production video generation workflows:
- Enhanced Complex Prompt Understanding: Accurately interprets multi-object, multi-condition prompts with detailed spatial descriptions and temporal sequences, reducing prompt misinterpretation in production scenarios.
- Improved Motion Dynamics: Generates smooth, physically plausible object movements with natural acceleration, deceleration, and trajectory curves, avoiding robotic or unnatural motion patterns.
- Temporal Consistency: Maintains object appearance, lighting, and scene coherence throughout the video sequence, reducing flickering and frame-to-frame artifacts common in generated videos.
- Precise Timing Control: Enables accurate control over action duration, pacing, and synchronization between multiple moving elements based on prompt semantics.
- Multi-Object Interaction: Handles complex scenes with multiple objects interacting simultaneously, including collisions, following, avoiding, and coordinated movements.
- Camera and Framing Stability: Maintains consistent camera perspective and framing throughout the sequence, avoiding unwanted camera shake or unexpected viewpoint changes.
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#training-configurationTraining Configuration
ConfigValueLearning Rate1e-4SchedulerCosineGradient Accumulation16 stepsGradient Clipping1.0OptimizerAdamW
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#evaluation-metricsEvaluation Metrics
MetricValueTraining Steps~6,000Final Loss~0.008Loss Reduction44% (from 0.014 to 0.008)
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#video-demoVideo Demo
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#training-progress-comparisonTraining Progress Comparison
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#step-0-base-modelStep 0 (Base Model)
Initial model output.
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#step-6000-fine-tunedStep 6000 (Fine-tuned)
After 6K steps of training.
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#datasetDataset
This model is trained on the VBVR (Video Benchmark for Video Reasoning) dataset fromvideo-reason.com.
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V#contactContact
For questions or suggestions, please open an issue on Hugging Face or contact the author directly.
Similar Articles
fal/LTX-2.3-3DREAL-LoRA
A LoRA adapter for LTX-2.3 that converts rough 3D viewport animations (from Blender, game engines) into photorealistic video while preserving composition and camera movement.
Lightricks/LTX-2.3-22b-IC-LoRA-LipDub
This Hugging Face model page introduces an IC-LoRA trained on top of LTX-2.3-22b for lip dubbing, with a project page, paper, and inference pipeline available.
Lightricks/LTX-2.3
Lightricks released LTX-2.3, an open-weight diffusion-based audio-video foundation model with improved quality and prompt adherence, available in multiple checkpoints including distilled and LoRA variants for local execution.
Video2LoRA: Parametric Video Internalization for Vision-Language Models
This paper introduces Video2LoRA, a method that predicts Low-Rank Adaptation (LoRA) weights directly from video representations, enabling efficient video processing in frozen vision-language models. It reduces visual token load by up to 1500x and query TTFT by 6-80x while maintaining performance on video summarization and captioning benchmarks.
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
LLaVA-OneVision-2 introduces codec-stream tokenization and windowed attention for efficient video understanding, achieving state-of-the-art performance across multiple multimodal benchmarks including video, spatial, and tracking tasks.