MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

Hugging Face Daily Papers 05/12/26, 12:00 AM Papers

Summary

MoCam is a research paper introducing a diffusion-based framework for unified novel view synthesis that dynamically coordinates geometric and appearance priors to improve robustness against geometric errors.

Generative novel view synthesis faces a fundamental dilemma: geometric priors provide spatial alignment but become sparse and inaccurate under view changes, while appearance priors offer visual fidelity but lack geometric correspondence. Existing methods either propagate geometric errors throughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process.MoCam first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly when point clouds contain severe holes or distortions, achieving robust geometry-appearance disentanglement.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/13/26, 04:12 AM

Paper page - MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

Source: https://huggingface.co/papers/2605.12119

Abstract

MoCam addresses the challenge of generative novel view synthesis by dynamically coordinating geometric and appearance priors through structured denoising dynamics within a diffusion framework.

Generative novelview synthesisfaces a fundamental dilemma:geometric priorsprovide spatial alignment but become sparse and inaccurate under view changes, whileappearance priorsoffer visual fidelity but lack geometric correspondence. Existing methods either propagategeometric errorsthroughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structureddenoising dynamicsto orchestrate a coordinated progression from geometry to appearance within thediffusion process.MoCam first leveragesgeometric priorsin early stages to anchor coarse structures and tolerate their incompleteness, then switches toappearance priorsin later stages to actively correctgeometric errorsand refine details. This design naturally unifies static and dynamicview synthesisby temporally decoupling geometric alignment andappearance refinementwithin thediffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly whenpoint cloudscontain severe holes or distortions, achieving robust geometry-appearance disentanglement.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2605\.12119

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.12119 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.12119 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.12119 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

Paper page - MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

sensenova/SenseNova-U1-8B-MoT

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

Submit Feedback

Similar Articles

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis