autoregressive

#autoregressive

Speculative Refinement: A Hybrid Autoregressive Diffusion Decoding Strategy and Its Behavior Across Benchmarks

arXiv cs.AI ↗ · 20h ago Cached

Introduces Speculative Refinement (SpecRef), a training-free hybrid decoding strategy that warm-starts a masked diffusion language model from an autoregressive draft using entropy-guided selective masking. Evaluated across six benchmarks, it reveals that code benchmarks conflate structural discovery with logical correctness, identifies a refinement tension phenomenon, and shows that evaluation protocols can produce different model rankings.

0 favorites 0 likes

#autoregressive

MultiHashFormer: Hash-based Generative Language Models

arXiv cs.CL ↗ · 20h ago Cached

MultiHashFormer is a hash-based generative language model that represents each token as a unique hash signature, enabling parameter-efficient autoregression. It outperforms standard Transformer LMs at 100M, 1B, and 3B scales and supports multilingual vocabulary expansion without increasing parameters.

0 favorites 0 likes

#autoregressive

Nemotron-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context

arXiv cs.CL ↗ · 3d ago Cached

The paper proposes Nemotron-TwoTower, a diffusion language model that decouples context representation and denoising using a frozen autoregressive tower and a trainable diffusion denoiser, achieving 98.7% of baseline quality with 2.42x throughput.

0 favorites 0 likes

#autoregressive

Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation

Hugging Face Daily Papers ↗ · 4d ago Cached

Parallel Rollout Approximation (PRA) improves pixel-space autoregressive image generation by using low-dimensional intermediate states and parallel training, achieving new state-of-the-art results on ImageNet-1K generation.

0 favorites 0 likes

#autoregressive

Demystifying Training-Time Augmentation for Data-Constrained Language Model Pretraining

Hugging Face Daily Papers ↗ · 2026-06-19 Cached

This paper investigates training-time data augmentation techniques to mitigate overfitting in autoregressive language model pretraining under data-constrained, compute-abundant regimes, finding that combining token-level noise, sequence permutations, and target offset prediction improves validation loss.

0 favorites 0 likes

#autoregressive

Shattering the Autoregressive Curse: Dynamic Epistemic Entropy Orchestrated Erasable Reinforcement Learning for LLMs

arXiv cs.AI ↗ · 2026-06-17 Cached

This paper proposes E³RL, a reinforcement learning method that uses dynamic epistemic entropy thresholds to enable LLMs to excise local logical defects during generation, overcoming the autoregressive curse in long-horizon reasoning and achieving state-of-the-art results on mathematical reasoning benchmarks like AIME.

0 favorites 0 likes

#autoregressive

Discrete Autoregressive Transformer for Generative Mechanism Synthesis

arXiv cs.LG ↗ · 2026-06-17 Cached

This paper presents a discrete autoregressive transformer that generates planar mechanisms from target coupler curves, using variational autoencoder latents and tokenized joint coordinates to achieve diverse, accurate designs across multiple topologies.

0 favorites 0 likes

#autoregressive

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

Hugging Face Daily Papers ↗ · 2026-06-16 Cached

MaineCoon is a 22B-parameter real-time audio-visual autoregressive model for social world modeling, capable of streaming generation at up to 47.5 FPS on a single GPU, introducing novel training techniques and an agentic inference framework.

0 favorites 0 likes

#autoregressive

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Hugging Face Daily Papers ↗ · 2026-06-16 Cached

UniAR presents a unified autoregressive framework that uses a single discrete visual tokenizer to bridge visual understanding and generation, achieving state-of-the-art results in image generation and editing.

0 favorites 0 likes

#autoregressive

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Hugging Face Daily Papers ↗ · 2026-06-15 Cached

LOGOS is a scientific generative language model that encodes diverse scientific objects and spatial interactions as token sequences, enabling a unified autoregressive framework for tasks across natural sciences. Models at 1B, 3B, and 8B parameters show consistent performance scaling and are released to facilitate research.

0 favorites 0 likes

#autoregressive

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

Hugging Face Daily Papers ↗ · 2026-06-09 Cached

FadeMem introduces a distance-aware key-value memory consolidation mechanism that organizes historical video data into a temporal hierarchy, improving long-video generation under fixed cache constraints.

0 favorites 0 likes

#autoregressive

Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws

arXiv cs.LG ↗ · 2026-06-08 Cached

This paper studies data-constrained language model pretraining, proposing masked-input regularization (MIR) to improve validation loss and downstream performance, and SoftQ, a scaling law that better captures model-data interaction under repeated data.

0 favorites 0 likes

#autoregressive

Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

Stream3D-VLM is an online 3D vision-language model that enables real-time spatial understanding from streaming video by incrementally integrating geometry priors and using geometry-adaptive voxel compression, outperforming existing models on 3D spatial understanding tasks.

0 favorites 0 likes

#autoregressive

Streaming Video Generation with Streaming Force Control

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

StreamForce is a causal, unified video generation model that provides real-time, physically grounded responses to time-varying forces through a distillation pipeline and autoregressive architecture, achieving state-of-the-art performance in force adherence and motion realism.

0 favorites 0 likes

#autoregressive

dots.tts Technical Report

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

dots.tts presents a 2B-parameter continuous autoregressive TTS model trained on multilingual data, achieving state-of-the-art performance on benchmarks like Seed-TTS-Eval with low-latency streaming via CFG-aware MeanFlow distillation. The model, code, and checkpoints are released under Apache 2.0.

0 favorites 0 likes

#autoregressive

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Hugging Face Daily Papers ↗ · 2026-06-03 Cached

Echo-Infinity introduces a learnable evolving memory mechanism for autoregressive video generation, enabling real-time infinite video generation with constant memory cost and state-of-the-art performance.

0 favorites 0 likes

#autoregressive

MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation

Hugging Face Daily Papers ↗ · 2026-06-03 Cached

MeshWeaver presents an autoregressive mesh generation framework that directly predicts vertices using a multi-level sparse-voxel encoder, achieving state-of-the-art compression and geometric fidelity for high-poly meshes.

0 favorites 0 likes

#autoregressive

@NielsRogge: NEPA has now been added here: Check the evals at the bottom to compare to other models

X AI KOLs Following ↗ · 2026-06-02 Cached

NEPA is a new method for visual self-supervised learning and generative pretraining that predicts the next embedding autoregressively, and has been added to a benchmark for evaluation.

0 favorites 0 likes

#autoregressive

Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion

Hugging Face Daily Papers ↗ · 2026-06-02 Cached

Steady-Forcing proposes a memory and training framework to balance spatial stability and motion continuity in long-horizon nature video generation, improving background consistency while sustaining fluid dynamics over multi-minute rollouts.

0 favorites 0 likes

#autoregressive

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

Hugging Face Daily Papers ↗ · 2026-06-02 Cached

AAD-1 introduces asymmetric adversarial distillation with phased training to achieve one-step autoregressive video generation, outperforming prior methods on VBench.

0 favorites 0 likes

autoregressive

Submit Feedback