ByteDance-Seed/Cola-DLM · Hugging Face

Reddit r/LocalLLaMA Models

Summary

ByteDance releases Cola-DLM, a hierarchical continuous latent-space diffusion language model combining a Text VAE with a block-causal Diffusion Transformer, available on Hugging Face with model weights, code, and paper.

**Cola DLM** (`Co`ntinuous `La`tent `D`iffusion `L`anguage `M`odel) is a hierarchical continuous latent-space diffusion language model. It combines a Text VAE with a block-causal Diffusion Transformer (DiT) prior: the VAE maps text into continuous latent sequences and decodes latents back to tokens, while the DiT performs latent prior transport through Flow Matching. This model repository contains the HuggingFace-format checkpoint for the paper **Continuous Latent Diffusion Language Model**. # [](https://huggingface.co/ByteDance-Seed/Cola-DLM#links)Links * **Model repository:** [https://huggingface.co/ByteDance-Seed/Cola-DLM](https://huggingface.co/ByteDance-Seed/Cola-DLM) * **GitHub repository:** [https://github.com/ByteDance-Seed/Cola-DLM](https://github.com/ByteDance-Seed/Cola-DLM) * **Paper:** [https://arxiv.org/abs/2605.06548](https://arxiv.org/abs/2605.06548) * **HuggingFace Daily Paper:** [https://huggingface.co/papers/2605.06548](https://huggingface.co/papers/2605.06548) * **Project page:** [https://hongcanguo.github.io/Cola-DLM/](https://hongcanguo.github.io/Cola-DLM/) * **Blog post:** [https://hongcanguo.github.io/posts/2026-cola-dlm.html](https://hongcanguo.github.io/posts/2026-cola-dlm.html) * **Zhihu article:** [https://zhuanlan.zhihu.com/p/2038324180920313704](https://zhuanlan.zhihu.com/p/2038324180920313704) # Model Details * **Architecture:** Text VAE + block-causal DiT latent prior. * **Training objective:** two-stage training with Text VAE pretraining followed by joint Text VAE + DiT training using Flow Matching. * **Training-compute checkpoint:** the released weights correspond to the 2000 EFLOPs checkpoint reported in the paper's RQ4 scaling curve. * **Tokenizer:** OLMo 2 tokenizer with a 100,278-entry vocabulary. * **Special token ids:** `pad_token_id=100277`, `eos_token_id=100257`, `im_end_token_id=100265`. * **Framework:** PyTorch 2.1+ and HuggingFace Transformers 4.40+. * **License:** Apache License 2.0.
Original Article
View Cached Full Text

Cached at: 05/15/26, 01:01 PM

ByteDance-Seed/Cola-DLM · Hugging Face

Source: https://huggingface.co/ByteDance-Seed/Cola-DLM English·中文

Cola DLM(ContinuousLatentDiffusionLanguageModel) is a hierarchical continuous latent-space diffusion language model. It combines a Text VAE with a block-causal Diffusion Transformer (DiT) prior: the VAE maps text into continuous latent sequences and decodes latents back to tokens, while the DiT performs latent prior transport through Flow Matching.

This model repository contains the HuggingFace-format checkpoint for the paperContinuous Latent Diffusion Language Model.

https://huggingface.co/ByteDance-Seed/Cola-DLM#linksLinks

https://huggingface.co/ByteDance-Seed/Cola-DLM#model-filesModel Files

The expected repository layout is:

.
├── cola_dlm/
│   ├── cola_dit/
│   │   ├── config.json
│   │   └── model.safetensors*
│   └── cola_vae/
│       ├── config.json
│       └── model.safetensors*
├── tokenizer.json
├── README.md
└── README_zh.md

The checkpoint consists of two cooperating modules:

  • ColaDiTModel: a block-causal 1-D Diffusion Transformer prior over continuous text latents.
  • ColaTextVAEModel: a Text VAE encoder and conditional decoder for text-to-latent and latent-to-text mapping.

https://huggingface.co/ByteDance-Seed/Cola-DLM#quickstartQuickstart

Install the Cola DLM code package from theGitHub repository, then install the download helper:

git clone https://github.com/ByteDance-Seed/Cola-DLM.git
cd Cola-DLM
pip install -e .
pip install huggingface_hub

Download the model files:

huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models

Run a minimal Python example:

import torch
from tokenizers import Tokenizer

from cola_dlm import (
    ColaDiTModel,
    ColaTextVAEModel,
    generate_task_repaint_inference,
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

dit = ColaDiTModel.from_pretrained("hf_models/cola_dlm/cola_dit").to(device)
vae = ColaTextVAEModel.from_pretrained("hf_models/cola_dlm/cola_vae").to(device)
tokenizer = Tokenizer.from_file("hf_models/tokenizer.json")

prompts = [{"question": "Question: What is the capital of France? Answer:"}]
results = generate_task_repaint_inference(
    dit=dit,
    vae=vae,
    tokenizer=tokenizer,
    prompts=prompts,
    task_name="lambada",
    device=device,
    max_new_tokens=32,
    temperature=0.0,
    guidance_scale=7.0,
    timestep_num=16,
    pad_token_id=100277,
)

print(results[0]["generate"])

https://huggingface.co/ByteDance-Seed/Cola-DLM#openai-compatible-servingOpenAI-Compatible Serving

The companionopenai\_adapter/service in the Cola DLM code release exposes this model through an OpenAI-compatible Chat Completions endpoint:

POST /v1/chat/completions

Install the adapter dependencies from the code repository root:

pip install -e .
pip install -r openai_adapter/requirements.txt

Start the service:

export COLA_DIT_PATH=hf_models/cola_dlm/cola_dit
export COLA_VAE_PATH=hf_models/cola_dlm/cola_vae
export COLA_TOKENIZER_PATH=hf_models/tokenizer.json
export COLA_MODEL_NAME=cola-dlm
export COLA_API_KEY=change-me

uvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000

Then send a request:

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer change-me" \
  -d '{
    "model": "cola-dlm",
    "messages": [
      {
        "role": "user",
        "content": "Question: What is the capital of France? Answer:"
      }
    ],
    "temperature": 0,
    "max_tokens": 32,
    "stream": false
  }'

The adapter currently supports non-streaming completions.

https://huggingface.co/ByteDance-Seed/Cola-DLM#model-detailsModel Details

  • **Architecture:**Text VAE + block-causal DiT latent prior.
  • **Training objective:**two-stage training with Text VAE pretraining followed by joint Text VAE + DiT training using Flow Matching.
  • **Training-compute checkpoint:**the released weights correspond to the 2000 EFLOPs checkpoint reported in the paper’s RQ4 scaling curve.
  • **Tokenizer:**OLMo 2 tokenizer with a 100,278-entry vocabulary.
  • Special token ids:pad\_token\_id=100277,eos\_token\_id=100257,im\_end\_token\_id=100265.
  • **Framework:**PyTorch 2.1+ and HuggingFace Transformers 4.40+.
  • **License:**Apache License 2.0.

https://huggingface.co/ByteDance-Seed/Cola-DLM#evaluationEvaluation

Reference zero-shot benchmark results from the open-source inference implementation:

TaskAccuracy (%)LAMBADA50.80MMLU19.30OBQA23.00HellaSwag10.70RACE19.60SIQA28.90SQuAD30.90Story Cloze30.77Tasks Average****26.75 The open-source HuggingFace Transformers implementation may differ slightly from the internal implementation used in the paper, so per-task numbers can fluctuate slightly. The overall trend is consistent with the paper.

https://huggingface.co/ByteDance-Seed/Cola-DLM#intended-useIntended Use

Cola DLM is intended primarily for research on hierarchical latent-variable language models, continuous latent diffusion for text, Flow Matching priors, and benchmark-style text generation.

This checkpoint isnot instruction-tunedand has not gone through RLHF. It should not be treated as a production chatbot or used for safety-critical decision making.

https://huggingface.co/ByteDance-Seed/Cola-DLM#limitationsLimitations

  • The model was trained primarily on English text; other languages are not well evaluated.
  • Outputs may contain factual errors, offensive content, bias, or hallucinations.
  • Generation quality can be sensitive to prompt format and prompt length. QA-style prompts such as"Question: \.\.\. Answer:"are recommended for quick evaluation.
  • The model uses mutable KV caches during generation; service implementations should serialize generation inside one process unless cache handling is explicitly isolated.

https://huggingface.co/ByteDance-Seed/Cola-DLM#safety-statement-and-use-restrictionsSafety Statement and Use Restrictions

Cola DLM is a research-oriented checkpoint for continuous latent diffusion language modeling. The released model is relatively small and hasnot been instruction-tuned, RLHF-aligned, or systematically safety-aligned. Therefore, it does not provide reliable refusal behavior, content moderation, or risk detection. Its outputs may contain inaccurate, offensive, biased, unlawful, inappropriate, or misleading content.

This model is intended only for academic research and technical experimentation. We do not encourage, support, or authorize the use of Cola DLM to generate, distribute, or assist with the following types of content:

  • Pornographic, sexually explicit, exploitative, or otherwise inappropriate content;
  • Gambling-related content, including gambling promotion, betting advice, or illegal gambling services;
  • Content related to illegal drugs or controlled substances, including instructions for manufacturing, purchasing, selling, using, or evading regulation;
  • Hate, harassment, discrimination, threats of violence, extremist, or inflammatory content;
  • Political manipulation, targeted political persuasion, political misinformation, incitement of international or intergroup conflict, or sensitive political content that may escalate social, national, or geopolitical tensions;
  • Illegal activities, regulatory evasion, cyber abuse, privacy violations, or other content that may cause real-world harm;
  • Automated advice or decision-making in high-stakes domains such as medical, legal, financial, safety-critical, or security-sensitive settings.

Users who download, deploy, fine-tune, redistribute, or build applications based on this model are responsible for implementing appropriate safety and compliance measures. Such measures may include, but are not limited to, input and output moderation, access control, logging and auditing, human review, red-teaming, and compliance checks under applicable laws and regulations.

Cola DLM should not be treated as a production-ready chatbot or a safety-reliable general-purpose assistant. Any content generated by this model does not represent the views, positions, or endorsements of the authors, affiliated institutions, or contributors.

https://huggingface.co/ByteDance-Seed/Cola-DLM#citationCitation

If you use Cola DLM in your work, please cite:

@article{guo2026cola,
  title   = {Continuous Latent Diffusion Language Model},
  author  = {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and
             Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and
             Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan},
  journal = {arXiv preprint arXiv:2605.06548},
  year    = {2026},
  url     = {https://arxiv.org/abs/2605.06548},
}

Similar Articles

Continuous Latent Diffusion Language Model

Hugging Face Daily Papers

Cola DLM is a hierarchical latent diffusion language model that uses text-to-latent mapping and conditional decoding to achieve efficient, non-autoregressive text generation.

Dynamic Chunking for Diffusion Language Models

arXiv cs.CL

This paper introduces Dynamic Chunking for Diffusion Language Models (DCDM), which replaces fixed positional blocks in block discrete diffusion with content-defined semantic chunks using a differentiable Chunking Attention mechanism, achieving consistent improvements across scales up to 1.5B parameters.

TextLDM: Language Modeling with Continuous Latent Diffusion

Hugging Face Daily Papers

This paper introduces TextLDM, a method that adapts visual latent diffusion transformers for language modeling by mapping discrete tokens to continuous latents. It demonstrates that this approach, enhanced by representation alignment, matches GPT-2 performance and unifies visual and text generation architectures.

CRoCoDiL: Continuous and Robust Conditioned Diffusion for Language

arXiv cs.CL

CRoCoDiL proposes a continuous and robust conditioned diffusion approach for language that shifts masked diffusion models into a continuous semantic space, achieving superior generation quality and 10x faster sampling speeds compared to discrete methods like LLaDA.