autoregressive-decoding

#autoregressive-decoding

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Hugging Face Daily Papers ↗ · 2d ago Cached

Introduces Future-L1, an interleaved latent visual reasoning framework that improves video event prediction by maintaining visual semantics in latent space. Achieves state-of-the-art results on FutureBench and TwiFF-Bench benchmarks.

0 favorites 0 likes

#autoregressive-decoding

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

Hugging Face Daily Papers ↗ · 4d ago Cached

KVarN is a calibration-free KV-cache quantizer that uses Hadamard rotation and dual-scaling variance normalization to reduce error accumulation during autoregressive decoding in large language models, achieving state-of-the-art 2-bit precision on reasoning benchmarks.

0 favorites 0 likes

#autoregressive-decoding

@NVIDIAAI: Most language models only generate one token at a time. We just released Nemotron-Labs-Diffusion, a family of diffusion…

X AI KOLs Following ↗ · 2026-05-19 Cached

NVIDIA released Nemotron-Labs-Diffusion, a family of diffusion language models that generate multiple tokens in parallel, enabling faster inference and better GPU utilization, with sizes from 3B to 14B including vision-language variants.

0 favorites 0 likes

#autoregressive-decoding

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

arXiv cs.CL ↗ · 2026-05-13 Cached

This paper introduces BitLM, a language model that uses bitwise continuous diffusion to generate multiple tokens in parallel, aiming to overcome the sequential bottleneck of traditional autoregressive generation while preserving causal structure.

0 favorites 0 likes

autoregressive-decoding

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

@NVIDIAAI: Most language models only generate one token at a time. We just released Nemotron-Labs-Diffusion, a family of diffusion…

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

Submit Feedback