Tag
This technical report investigates draft-conditioned latent refinement for non-autoregressive text generation, showing that good latent geometry does not guarantee good decoding and emphasizing decoder recoverability as a key evaluation metric.
dLLM is an open-source library that converts any autoregressive LLM into a diffusion LLM, enabling parallel decoding and faster text generation.
This paper reformulates language generation as a stochastic optimal control problem, addressing limitations of autoregressive and diffusion models, and proposes a closed-loop diffusion method in latent control space using Flow Matching, achieving high-fidelity generation and efficient parallel sampling.
NemoStation/Marlin-2B is a fine-tuned model based on Qwen3.5-2B for video-text-to-text tasks, supporting video captioning and temporal grounding.
This paper introduces Trajectory-Shaped Discrete Flow Matching (TS-DFM), which replaces blind stochastic jumps with guided navigation to significantly improve text generation efficiency and reduce computational costs. The method achieves superior perplexity and speed compared to traditional multi-step baselines while maintaining unchanged inference costs.
Cola DLM is a hierarchical latent diffusion language model that uses text-to-latent mapping and conditional decoding to achieve efficient, non-autoregressive text generation.
Researchers from Utah State and Vanderbilt benchmark GPT-4, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2 and BERT on three social-media tasks—authorship verification, post generation, and user attribute inference—introducing new sampling protocols and taxonomies to reduce bias and enable reproducible benchmarks.
MedConclusion introduces a large-scale benchmark of 5.7 million PubMed structured abstracts for evaluating LLMs on biomedical conclusion generation from structured scientific evidence. The study finds that conclusion writing is behaviorally distinct from summarization and that current automatic metrics cluster strong models closely together.
The article reflects on the history of text generation, drawing parallels between modern LLMs like GPT-4 and earlier concepts from Jorge Luis Borges and Claude Shannon. It explores how Shannon's probabilistic experiments and Borges' 'Library of Babel' metaphor help clarify fundamental questions about the nature of generated text and data structure.
Google introduces Gemini Omni, a new multimodal AI model capable of processing and generating content across text, images, audio, and video from any input type.