Tag
DeepSeek released DSpark, a system where the main model rapidly generates a sentence while a tiny editor fixes coherence before verification, pushing LLM systems engineering beyond new architecture.
This paper investigates prompt-based learning for automatically generating highlights of academic papers, using models like GPT-2, T5, and ChatGPT, and shows that ChatGPT with few-shot prompts achieves performance comparable to or better than supervised methods without requiring task-specific training data.
Nvidia claims a 15x speedup in text generation using a diffusion model, generating entire blocks at once.
GLM-5.2 is an open weight AI model optimized for creative writing tasks, claimed to be the best in its category.
VoidPadding introduces a [VOID] token to handle padding in masked diffusion language models, allowing [EOS] to focus solely on semantic termination. This method significantly improves performance on reasoning and coding benchmarks while reducing decoding steps.
This article explores using the gzip compression algorithm as a language model, demonstrating that compression algorithms can generate text by scoring candidate continuations based on compressed length, using beam search to produce output.
The tweet describes how any compression tool, including gzip, can be adapted for language modeling, and that gzip can generate text that resembles Shakespeare. A write-up is linked.
Google has open-sourced DiffusionGemma, a novel diffusion-based text generation model that uses block diffusion and efficient encoder-decoder techniques, with contributions from Cornell University researchers.
Google released DiffusionGemma, an open-weight text generation model (26B parameters, 4B active) under Apache 2 license, demonstrating high inference speeds via NVIDIA's NIM cloud API.
Google released DiffusionGemma, an experimental open-source diffusion model for text generation that achieves 4x speed boost over autoregressive models, optimized for local processing.
DiffusionGemma, a 26B MoE model based on Gemma 4, achieves over 1000 tokens per second using diffusion for text generation in 256-token blocks, fitting in 18GB VRAM with quantization, released under Apache 2.0.
DiffusionGemma is a new experimental model from Google DeepMind that uses parallel generation on a 256-token canvas, achieving up to 4x faster token generation on GPUs. This developer guide explains its architecture, bidirectional context, and includes a fine-tuning recipe for solving Sudoku.
Google introduces DiffusionGemma, an experimental 26B MoE open model that achieves up to 4x faster text generation on GPUs using text diffusion, targeting speed-critical interactive local workflows.
Google DeepMind releases DiffusionGemma, a 26B-parameter Mixture-of-Experts model that uses discrete diffusion for faster text generation, supporting multimodal inputs and a 256K token context.
This paper proposes AXON, a training-free module that improves the quality-latency trade-off of discrete diffusion language model decoding by intelligently selecting 'anchor' tokens to reveal first, using attention, uncertainty, and confidence signals to support subsequent denoising steps. Experiments on reasoning and code-generation benchmarks show AXON reduces function evaluations while maintaining or improving accuracy.
Introduces LEDE, a framework using offline reinforcement learning to dynamically select exit layers and speculation lengths for self-speculative decoding in LLMs, achieving up to 2.7x speedup over autoregressive decoding.
NVIDIA introduces Nemotron-Labs Diffusion, a family of diffusion language models that generate text in parallel and iteratively refine it, offering faster generation and the ability to revise previous tokens.
This paper introduces TokenDrift, a drifting objective that refines discrete diffusion language models by lifting categorical predictions to a continuous semantic space for anti-symmetric drifting, significantly improving generation quality under a fixed number of denoising steps.
This paper presents MiniGPT, a compact from-scratch implementation of GPT-style autoregressive language modeling in PyTorch, built after studying nanoGPT. It evaluates the model on the Tiny Shakespeare dataset using character-level tokenization, achieving a validation loss of 1.4780 with a 10.77M-parameter configuration.
This paper introduces Dynamic Chunking for Diffusion Language Models (DCDM), which replaces fixed positional blocks in block discrete diffusion with content-defined semantic chunks using a differentiable Chunking Attention mechanism, achieving consistent improvements across scales up to 1.5B parameters.