TenStrip/LTX2.3-10Eros
Summary
This article introduces TenStrip/LTX2.3-10Eros, a fine-tuned AI video model on Hugging Face designed for improved image-to-video generation and prompt adherence. It provides technical details on file formats, compatibility with ComfyUI nodes, and specific prompting strategies for optimal results.
View Cached Full Text
Cached at: 05/08/26, 08:54 AM
TenStrip/LTX2.3-10Eros · Hugging Face
Source: https://huggingface.co/TenStrip/LTX2.3-10Eros 10 Eros
https://huggingface.co/TenStrip/LTX2.3-10Eros_Workflows
Nodes:https://github.com/TenStrip/10S-Comfy-nodes
Reliant onhttps://huggingface.co/SulphurAI/Sulphur-2-baseThis is a different merge attempt for ideal I2V use. It uses layer scaled merges of different steps, it’s not a straight weight merge. It behaves much nicer than lora load and respects prompt. Prompt should be enhanced, LTX has very little self reasoning and input when it is conditioned, first frame and all following motions, evolutions, and audio must be commanded-you will get nothing if you don’t ask it.
BF16 loads as a checkpoint with clip and VAEs.
Fp8_mixed_learned is the better FP8 version and is a full checkpoint as well, quant by S1LV3RC01N.
Kijai split files are for 10Eros FP8 Transformer version, but it has a different structure and variance. That one goes inside diffusion_models:https://huggingface.co/Kijai/LTX2.3_comfy/tree/main
!!! Larger distilled Loras will harm the model’s fine tune, try the cond_safe ones:https://huggingface.co/TenStrip/LTX2.3_Distilled_Lora_1.1_Experiments/tree/main
For prompt enhancement, try this foreword in Grok or Uncensored LLM:
Generate a video scene script with a description based on the attached image for an LLM that has a tokenizer that uses interleaved attention to support long-context understanding that is fed into a multimodal video model. Strict specification, follow up to the word: No timestamps. No unnecessary embellishment. Output only plain English text and make it a copy box.
First, describe the image initial scene in concise natural language; subject(s), subject(s) appearance, subject(s) composition and pose, background, and context.
Next, formulate a naturally evolving scenario that would take place describing every moving body part, composition change, and manipulation from the uploaded initial frame that would be reflected in the video models post-latent evolution output. If the image is explicit or sexual in nature, use full anatomical terminology and spice it up slightly with visually representable erotic themes.
Center the prompt around this basic idea: [ concept ]
interweave this dialogue or sound concept into the scene with descriptions of voice tone followed by the lines delivered in quotations, in a temporal sequence between or during motions. Dialogue should be concise and non-rambling as it will take away from video quality: [ dialogue ]
Inside that prompt describe only notable audio and audio queues, both normal and explicit; background noise as well as foley and natural sounds. In a temporal sequence paired with coinciding motions. In the case of absent dialogue or soundscapes and only if background music is fitting; describe a fitting genre and melodic tone with matching mood.
Output only text following above instruction. Follow-up suggestions should be on the topic of expanding or changing motion or dialogue from the output text.
Similar Articles
RuneXX/LTX-2.3-Workflows
This Hugging Face repository provides workflows and model downloads for Lightricks' LTX-2.3 video generation model, designed for use with ComfyUI, including split models, GGUF versions, and required custom nodes.
Lightricks/LTX-2.3-22b-IC-LoRA-LipDub
This Hugging Face model page introduces an IC-LoRA trained on top of LTX-2.3-22b for lip dubbing, with a project page, paper, and inference pipeline available.
Lightricks/LTX-2
LTX-2 is the first DiT-based audio-video foundation model from Lightricks, offering synchronized audio and video generation, high fidelity, and production-ready outputs, with open-source code and open model weights.
LTX-2: Efficient Joint Audio-Visual Foundation Model
LTX-2 is introduced as an efficient joint audio-visual foundation model. The text includes a mix of the paper reference and a video script about countries facing existential threats, but the primary classification target is the AI model paper.
nvidia/Cosmos3-Super-Image2Video
NVIDIA releases Cosmos3-Super-Image2Video, a model that generates temporally coherent video sequences from an input image and text instructions, part of the Cosmos 3 omnimodal world model platform for Physical AI applications.