SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction

Hugging Face Daily Papers 06/14/26, 12:00 AM Papers

Summary

SpatialAvatar-0 introduces a multi-stage reconstruction method for high-quality 4D head avatars using a shared FLAME-mesh-bound Gaussian representation, achieving superior performance across benchmarks with reduced iterations.

High-quality 4D head avatars from one or a few source portraits are central to telepresence, AR/VR, and digital-human interaction. 3D Gaussian Splatting (3DGS) has emerged as the dominant representation, with two complementary regimes (generalizable feed-forward predictors and per-subject refiners) maturing in parallel. However, existing feed-forward predictors are trained on a single dataset family with a hard-coded source count, inheriting the corresponding domain bias. Per-subject refiners require 300K--600K iterations and rely on adaptive densification that destroys upstream Gaussian layouts, preventing the two regimes from sharing a representation end-to-end. To bridge both regimes we propose SpatialAvatar-0 on a shared FLAME-mesh-bound Gaussian representation: a feed-forward generator with a parameter-free K-source mean-pool and a monocular-temporal to multi-view-spatial two-phase schedule that anchors against identity-prior collapse onto the smaller multi-view set. We further introduce a 10K-iter layout-preserving per-subject refinement loop that freezes the FLAME-binding and Gaussian count and replaces densification with a three-component anti-spike regularization. On VFHQ/HDTF cross-domain zero-shot we surpass the in-domain leader GAGAvatar by +1.5 dB PSNR despite never training on either test domain, and on the SplattingAvatar monocular benchmark we lead every reported metric, surpassing the 300K-iter GeoAvatar by +1.3 dB PSNR at up to 60x shorter per-subject schedule than common SOTA baselines. Website: https://spatialwalk.github.io/SpatialAvatar-0.

Original Article

View Cached Full Text

Cached at: 06/22/26, 09:30 AM

Paper page - SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction

Source: https://huggingface.co/papers/2606.15659

Abstract

SpatialAvatar-0 enables high-quality 4D head avatar generation by combining feed-forward prediction with per-subject refinement through a shared Gaussian representation, achieving superior performance across multiple benchmarks.

High-quality 4D head avatars from one or a few source portraits are central to telepresence, AR/VR, and digital-human interaction.3D Gaussian Splatting(3DGS) has emerged as the dominant representation, with two complementary regimes (generalizablefeed-forward predictors andper-subject refiners) maturing in parallel. However, existingfeed-forward predictors are trained on a single dataset family with a hard-coded source count, inheriting the corresponding domain bias.Per-subject refiners require 300K--600K iterations and rely on adaptive densification that destroys upstream Gaussian layouts, preventing the two regimes from sharing a representation end-to-end. To bridge both regimes we propose SpatialAvatar-0 on a sharedFLAME-mesh-bound Gaussian representation: a feed-forward generator with a parameter-free K-sourcemean-pooland a monocular-temporal to multi-view-spatial two-phase schedule that anchors againstidentity-prior collapseonto the smaller multi-view set. We further introduce a 10K-iter layout-preserving per-subject refinement loop that freezes the FLAME-binding and Gaussian count and replaces densification with a three-componentanti-spike regularization. On VFHQ/HDTFcross-domain zero-shotwe surpass the in-domain leader GAGAvatar by +1.5 dBPSNRdespite never training on either test domain, and on the SplattingAvatar monocular benchmark we lead every reported metric, surpassing the 300K-iter GeoAvatar by +1.3 dBPSNRat up to 60x shorter per-subject schedule than common SOTA baselines. Website: https://spatialwalk.github.io/SpatialAvatar-0.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2606\.15659

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.15659 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.15659 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.15659 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction

Paper page - SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

Avatar V: Scaling Video-Reference Avatar Video Generation

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild

Submit Feedback

Similar Articles

FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

Avatar V: Scaling Video-Reference Avatar Video Generation

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild