ZipSplat: Fewer Gaussians, Better Splats

Hugging Face Daily Papers 06/03/26, 12:00 AM Papers

3d-gaussian-splatting novel-view-synthesis feed-forward token-based scene-reconstruction pose-free clustering

Summary

ZipSplat is a token-based feed-forward 3D Gaussian Splatting model that uses k-means clustering to decouple Gaussian placement from the pixel grid, achieving ~6x fewer Gaussians while setting new state-of-the-art results on DL3DV and RealEstate10K without requiring ground-truth poses or intrinsics.

Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with {sim}6{times} fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at {https://veichta.com/zipsplat{https://veichta.com/zipsplat}}.

Original Article

Similar Articles

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

Hugging Face Daily Papers

GlobalSplat introduces an efficient feed-forward framework for 3D Gaussian splatting that achieves compact and consistent scene reconstruction using global scene tokens, reducing computational overhead and inference time to under 78ms. The method uses a coarse-to-fine training approach to prevent representation bloat while maintaining competitive novel-view synthesis performance with significantly fewer Gaussians (16K) compared to dense baselines.

Gaussian Point Splatting

Hacker News Top

Researchers propose Gaussian Point Splatting, a stochastic rendering method using pixel-sized opaque points and 64-bit GPU atomics that scales to hundreds of millions of Gaussians in real time. The method, accepted at SIGGRAPH 2026, employs hierarchical culling and parallel programming primitives to achieve even workload distribution with only minor noise differences compared to original Gaussian splatting.

Gaussian Splat of a Strawberry

Hacker News Top

A Gaussian splat of a strawberry, created from 90 perspectives with focus stacking, using slang-splat for training and SuperSplat for viewing.

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

Hugging Face Daily Papers

VidSplat is a training-free generative reconstruction framework that uses video diffusion priors to recover complete 3D scenes from sparse inputs by synthesizing novel views.

SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis

Hugging Face Daily Papers

SplatWeaver is a feed-forward novel view synthesis framework that dynamically allocates 3D Gaussian primitives based on spatial complexity, improving rendering quality and efficiency over fixed-allocation methods. It leverages cardinality Gaussian experts and a pixel-level routing scheme guided by high-frequency priors to adaptively distribute primitives across complex and smooth scene regions.

Similar Articles

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

Gaussian Point Splatting

Gaussian Splat of a Strawberry

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis

Submit Feedback