FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction

Hugging Face Daily Papers 05/14/26, 12:00 AM Papers

few-shot avatar-reconstruction 3d-gaussian feed-forward flame-parameters multi-view-fusion real-time

Summary

FFAvatar proposes a feed-forward framework for reconstructing high-quality, animatable 3D Gaussian head avatars from few unposed images in seconds, achieving a 5.5 PSNR improvement over state-of-the-art on the NeRSemble benchmark.

Avatar reconstruction has traditionally relied on per-subject optimization that requires hours of computation or on expensive preprocessing that limits scalability. We introduce FFAvatar, a generalizable feed-forward framework that reconstructs high-quality, animatable 3D Gaussian head avatars from few-shot unposed portrait images in seconds. FFAvatar fuses information from multiple source images into a unified canonical Gaussian representation through Multi-View Query-Former, which is animated via FLAME parameters predicted end-to-end directly from pixels, eliminating the overhead of offline FLAME extraction. We further propose a three-stage training curriculum that achieves both broad generalization and high-fidelity reconstruction: (i) scalable pretraining on extensive monocular video data with over 1M identities to learn strong generalizable priors; (ii) multi-view fine-tuning on a small but high-quality dataset of 360-degree captures to enhance geometric fidelity and extreme-view awareness; and (iii) optional personalization that adapts to specific identities for maximum fidelity within 500 optimization steps. Extensive experiments demonstrate that FFAvatar sets a new standard for identity preservation, geometric consistency, and animation fidelity. On the NeRSemble benchmark, it outperforms the state-of-the-art LAM by a substantial 5.5 PSNR gain. Furthermore, FFAvatar enables real-time deployment, reconstructing avatars in 2 seconds without personalization and 10 seconds with personalization, while supporting 49 FPS animation on a single NVIDIA A100 GPU.

Original Article

View Cached Full Text

Cached at: 05/18/26, 02:23 AM

Paper page - FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction

Source: https://huggingface.co/papers/2605.15320

Abstract

FFAvatar enables fast, high-quality 3D head avatar reconstruction from few unposed images using a feed-forward approach with multi-view fusion and end-to-end FLAME parameter prediction.

Avatar reconstruction has traditionally relied on per-subject optimization that requires hours of computation or on expensive preprocessing that limits scalability. We introduce FFAvatar, a generalizablefeed-forward frameworkthat reconstructs high-quality, animatable3D Gaussian head avatarsfrom few-shot unposed portrait images in seconds. FFAvatar fuses information from multiple source images into a unified canonical Gaussian representation throughMulti-View Query-Former, which is animated viaFLAME parameterspredicted end-to-end directly from pixels, eliminating the overhead of offline FLAME extraction. We further propose athree-stage training curriculumthat achieves both broad generalization and high-fidelity reconstruction: (i)scalable pretrainingon extensive monocular video data with over 1M identities to learn strong generalizable priors; (ii)multi-view fine-tuningon a small but high-quality dataset of 360-degree captures to enhance geometric fidelity and extreme-view awareness; and (iii)optional personalizationthat adapts to specific identities for maximum fidelity within 500 optimization steps. Extensive experiments demonstrate that FFAvatar sets a new standard for identity preservation, geometric consistency, and animation fidelity. On theNeRSemble benchmark, it outperforms the state-of-the-artLAMby a substantial 5.5 PSNR gain. Furthermore, FFAvatar enablesreal-time deployment, reconstructing avatars in 2 seconds without personalization and 10 seconds with personalization, while supporting 49 FPS animation on a singleNVIDIA A100 GPU.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2605\.15320

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.15320 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.15320 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.15320 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction

Paper page - FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

tencentarc/gfpgan

FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder

Submit Feedback

Similar Articles

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder