diffusion

#diffusion

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, an unusual diffusion-based language model built from the Nemotron 3 Nano 30B-A3B backbone.

Reddit r/LocalLLaMA ↗ · 2h ago Cached

NVIDIA released Nemotron-TwoTower-30B-A3B-Base-BF16, a diffusion-based language model that uses block-wise autoregressive diffusion to generate text by iterative denoising of token blocks, achieving 2.42× the generation throughput of the autoregressive baseline while retaining 98.7% of benchmark quality.

0 favorites 0 likes

#diffusion

@askalphaxiv: "Atomistic Language Models Understand and Generate Materials" Most materials AI still treats crystals and language sepa…

X AI KOLs Timeline ↗ · 17h ago Cached

This paper introduces an atomistic language model that integrates a 3D atom encoder, Qwen LLM, and diffusion crystal generator to natively handle multimodal materials data, achieving state-of-the-art crystal structure prediction and de novo generation.

0 favorites 0 likes

#diffusion

Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation

arXiv cs.AI ↗ · yesterday Cached

This paper introduces DigenRL, a disaggregated RL framework for diffusion-based generative LLMs that uses generation-axis pipeline parallelism and trainer-assisted generation to improve throughput by 1.56-2.10x over existing systems.

0 favorites 0 likes

#diffusion

Approximate Structured Diffusion for Sequence Labelling

arXiv cs.CL ↗ · 2026-06-18 Cached

This paper introduces Approximate Structured Diffusion, a method that combines conditional random fields (CRFs) with discrete diffusion for sequence labelling. It uses a CRF conditioned on noisy label sequences and approximate mean-field inference, achieving a 16.5% error reduction on POS tagging.

0 favorites 0 likes

#diffusion

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Hugging Face Daily Papers ↗ · 2026-06-18 Cached

JanusMesh is a fast, training-free framework that generates text-driven 3D visual illusions—a single mesh revealing different semantics from different viewing angles—by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis, achieving high realism in just 3-5 minutes.

0 favorites 0 likes

#diffusion

@ZhengyangGeng: You can always trust Kaiming's quality bar. Writing, code, data, recipe, ckpt... https://github.com/PeppaKing8/minit2i-…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

MiniT2I is a minimalist direct-RGB text-to-image generator using a pixel-space MM-JiT denoiser with flow matching and frozen FLAN-T5-Large text tokens, with open-source JAX/Flax and PyTorch implementations released along with checkpoints.

0 favorites 0 likes

#diffusion

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

Hugging Face Daily Papers ↗ · 2026-06-17 Cached

Moebius is a 0.22B parameter image inpainting framework that rivals 10B-level models like FLUX.1-Fill-Dev, achieving over 15x faster inference through novel local-global interaction blocks and adaptive distillation strategies.

0 favorites 0 likes

#diffusion

@xichen_pan: Modern text-to-image models are increasingly powered by large pretrained LLMs. But there is a curious mismatch: the LLM…

X AI KOLs Following ↗ · 2026-06-16 Cached

RepFusion introduces a method to use pretrained multimodal LLMs as noisy representation encoders in diffusion transformers for text-to-image generation, outperforming baselines with similar compute.

0 favorites 0 likes

#diffusion

@DengHokin: I am super excited to share that I launch a weekly Video Model Journal Club. Every week we pick one paper and go deep, …

X AI KOLs Timeline ↗ · 2026-06-16 Cached

The author launches a weekly Video Model Journal Club covering video generation, world models, physical reasoning, diffusion, flow matching, etc. The first in-person talk will be by Yilun Du on Embodied Reasoning with World Models.

0 favorites 0 likes

#diffusion

SP^3: Spherical Priors for Plug-and-Play Restoration

Hugging Face Daily Papers ↗ · 2026-06-15 Cached

This paper introduces SP³, a method using Spherical Encoder priors for Plug-and-Play image restoration, achieving perceptual quality comparable to zero-shot diffusion priors while being 3–630× faster across tasks.

0 favorites 0 likes

#diffusion

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

Hugging Face Daily Papers ↗ · 2026-06-11 Cached

MoVerse generates real-time interactive video from single images by creating 360° panoramas and 3D Gaussian scaffolds, enabling efficient rendering through diffusion-based techniques.

0 favorites 0 likes

#diffusion

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

Hugging Face Daily Papers ↗ · 2026-06-11 Cached

VideoMDM trains 3D human motion priors from 2D poses using a diffusion framework with 2D reprojection loss and 3D motion regularizers, achieving near-3D supervised performance without requiring 3D ground truth.

0 favorites 0 likes

#diffusion

DiffusionGemma

Simon Willison's Blog ↗ · 2026-06-10 Cached

Google released DiffusionGemma, an open-weight text generation model (26B parameters, 4B active) under Apache 2 license, demonstrating high inference speeds via NVIDIA's NIM cloud API.

0 favorites 0 likes

#diffusion

@_philschmid: Gemma goes diffusion! DiffusionGemma with up to 1000+ tokens per second! - Built on Gemma 4 as a 26B MoE model. - 3.8B …

X AI KOLs Following ↗ · 2026-06-10 Cached

DiffusionGemma, a 26B MoE model based on Gemma 4, achieves over 1000 tokens per second using diffusion for text generation in 256-token blocks, fitting in 18GB VRAM with quantization, released under Apache 2.0.

0 favorites 0 likes

#diffusion

@svlevine: Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL o…

X AI KOLs Following ↗ · 2026-06-10

New paper shows how to optimize flow matching actors for reinforcement learning by approximating the Jacobian of the flow denoising process with the identity matrix, making training feasible.

0 favorites 0 likes

#diffusion

google/diffusiongemma-26B-A4B-it

Hugging Face Models Trending ↗ · 2026-06-09 Cached

Google DeepMind releases DiffusionGemma, a 26B-parameter Mixture-of-Experts model that uses discrete diffusion for faster text generation, supporting multimodal inputs and a 256K token context.

0 favorites 0 likes

#diffusion

Why are cells small?

Hacker News Top ↗ · 2026-06-08 Cached

An essay explaining the physical constraints on cell size, focusing on surface-area-to-volume ratio and diffusion limits that make cells small.

0 favorites 0 likes

#diffusion

SwiftVR: Real-Time One-Step Generative Video Restoration

Hugging Face Daily Papers ↗ · 2026-06-08 Cached

SwiftVR is a real-time one-step generative video restoration framework that achieves high frame rates on consumer GPUs using efficient attention mechanisms and a lightweight restoration-aware autoencoder.

0 favorites 0 likes

#diffusion

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

Hugging Face Daily Papers ↗ · 2026-06-07 Cached

MaskAlign proposes a token-subset representation alignment method that improves diffusion transformer training by reducing reliance on complete token sets and maintaining stable alignment under perturbations.

0 favorites 0 likes

#diffusion

Balancing Image Compression and Generation with Bootstrapped Tokenization

arXiv cs.LG ↗ · 2026-06-05 Cached

Introduces SelfBootTok, a self-bootstrapped tokenization method that separates global and local information, reducing generator computation by ~40% and achieving a new state-of-the-art gFID of 1.56 with only 64 tokens.

0 favorites 0 likes

diffusion

Submit Feedback