Lightricks/LTX-2.3
Summary
Lightricks released LTX-2.3, an open-weight diffusion-based audio-video foundation model with improved quality and prompt adherence, available in multiple checkpoints including distilled and LoRA variants for local execution.
View Cached Full Text
Cached at: 05/08/26, 06:28 PM
Lightricks/LTX-2.3 · Hugging Face
Source: https://huggingface.co/Lightricks/LTX-2.3
https://huggingface.co/Lightricks/LTX-2.3#ltx-23-model-cardLTX-2.3 Model Card
This model card focuses on the LTX-2.3 model, which is a significant update to theLTX-2 modelwith improved audio and visual quality as well as enhanced prompt adherence. LTX-2 was presented in the paperLTX-2: Efficient Joint Audio-Visual Foundation Model.
💻💻If you want to dive in right to the code - it is availablehere.💾💾
LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.
https://huggingface.co/Lightricks/LTX-2.3#model-checkpointsModel Checkpoints
NameNotesltx-2.3-22b-devThe full model, flexible and trainable in bf16ltx-2.3-22b-distilledThe distilled version of the full model, 8 steps, CFG=1ltx-2.3-22b-distilled-1.1The distilled v1.1 version of the full model, 8 steps, CFG=1 - A different aesthetic experience and improved audio compared to v1.0ltx-2.3-22b-distilled-lora-384A LoRA version of the distilled model applicable to the full modelltx-2.3-22b-distilled-lora-384-1.1A LoRA version of the v1.1 distilled model applicable to the full modelltx-2.3-spatial-upscaler-x2-1.1An x2 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolutionltx-2.3-spatial-upscaler-x1.5-1.0An x1.5 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolutionltx-2.3-temporal-upscaler-x2-1.0An x2 temporal upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher FPS
https://huggingface.co/Lightricks/LTX-2.3#model-detailsModel Details
- **Developed by:**Lightricks
- **Model type:**Diffusion-based audio-video foundation model
- **Language(s):**English
https://huggingface.co/Lightricks/LTX-2.3#online-demoOnline demo
LTX-2.3 is accessible right away via theAPI Playground.
https://huggingface.co/Lightricks/LTX-2.3#run-locallyRun locally
https://huggingface.co/Lightricks/LTX-2.3#direct-use-licenseDirect use license
You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under thelicense.
https://huggingface.co/Lightricks/LTX-2.3#comfyuiComfyUI
We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager. For manual installation information, please refer to ourdocumentation site.
https://huggingface.co/Lightricks/LTX-2.3#pytorch-codebasePyTorch codebase
TheLTX-2 codebaseis a monorepo with several packages. From model definition in ‘ltx-core’ to pipelines in ‘ltx-pipelines’ and training capabilities in ‘ltx-trainer’. The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.
https://huggingface.co/Lightricks/LTX-2.3#installationInstallation
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
# From the repository root
uv sync
source .venv/bin/activate
https://huggingface.co/Lightricks/LTX-2.3#inferenceInference
To use our model, please follow the instructions in ourltx-pipelinespackage.
https://huggingface.co/Lightricks/LTX-2.3#diffusers-%F0%9F%A7%A8Diffusers 🧨
LTX-2.3 support in theDiffusers Python libraryis coming soon!
https://huggingface.co/Lightricks/LTX-2.3#general-tipsGeneral tips:
- Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.
- In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames.
- For tips on writing effective prompts, please visit ourPrompting guide
https://huggingface.co/Lightricks/LTX-2.3#limitationsLimitations
- This model is not intended or able to provide factual information.
- As a statistical model this checkpoint might amplify existing societal biases.
- The model may fail to generate videos that matches the prompts perfectly.
- Prompt following is heavily influenced by the prompting-style.
- The model may generate content that is inappropriate or offensive.
- When generating audio without speech, the audio may be of lower quality.
https://huggingface.co/Lightricks/LTX-2.3#train-the-modelTrain the model
The base (dev) model is fully trainable.
It’s extremely easy to reproduce the LoRAs and IC-LoRAs we publish with the model by following the instructions on theLTX-2 Trainer Readme.
Training for motion, style or likeness (sound+appearance) can take less than an hour in many settings.
https://huggingface.co/Lightricks/LTX-2.3#citationCitation
@article{hacohen2025ltx2,
title={LTX-2: Efficient Joint Audio-Visual Foundation Model},
author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and Bitterman, Yaki and Kvochko, Andrew and Berkowitz, Avishai and Shalem, Daniel and Lifschitz, Daphna and Moshe, Dudu and Porat, Eitan and Richardson, Eitan and Guy Shiran and Itay Chachy and Jonathan Chetboun and Michael Finkelson and Michael Kupchick and Nir Zabari and Nitzan Guetta and Noa Kotler and Ofir Bibi and Ori Gordon and Poriya Panet and Roi Benita and Shahar Armon and Victor Kulikov and Yaron Inger and Yonatan Shiftan and Zeev Melumian and Zeev Farbman},
journal={arXiv preprint arXiv:2601.03233},
year={2025}
}
Similar Articles
Lightricks/LTX-2
LTX-2 is the first DiT-based audio-video foundation model from Lightricks, offering synchronized audio and video generation, high fidelity, and production-ready outputs, with open-source code and open model weights.
Lightricks/LTX-2.3-22b-IC-LoRA-LipDub
This Hugging Face model page introduces an IC-LoRA trained on top of LTX-2.3-22b for lip dubbing, with a project page, paper, and inference pipeline available.
LiconStudio/Ltx2.3-VBVR-lora-I2V
LiconStudio releases a LoRA adapter for LTX-2.3 fine-tuned on the VBVR dataset to enhance video generation with improved prompt understanding, motion dynamics, and temporal consistency for complex video reasoning tasks.
fal/LTX-2.3-3DREAL-LoRA
A LoRA adapter for LTX-2.3 that converts rough 3D viewport animations (from Blender, game engines) into photorealistic video while preserving composition and camera movement.
LTX-2: Efficient Joint Audio-Visual Foundation Model
LTX-2 is introduced as an efficient joint audio-visual foundation model. The text includes a mix of the paper reference and a video script about countries facing existential threats, but the primary classification target is the AI model paper.
