MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

Hugging Face Daily Papers Papers

Summary

MoCam is a research paper introducing a diffusion-based framework for unified novel view synthesis that dynamically coordinates geometric and appearance priors to improve robustness against geometric errors.

Generative novel view synthesis faces a fundamental dilemma: geometric priors provide spatial alignment but become sparse and inaccurate under view changes, while appearance priors offer visual fidelity but lack geometric correspondence. Existing methods either propagate geometric errors throughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process.MoCam first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly when point clouds contain severe holes or distortions, achieving robust geometry-appearance disentanglement.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/13/26, 04:12 AM

Paper page - MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

Source: https://huggingface.co/papers/2605.12119

Abstract

MoCam addresses the challenge of generative novel view synthesis by dynamically coordinating geometric and appearance priors through structured denoising dynamics within a diffusion framework.

Generative novelview synthesisfaces a fundamental dilemma:geometric priorsprovide spatial alignment but become sparse and inaccurate under view changes, whileappearance priorsoffer visual fidelity but lack geometric correspondence. Existing methods either propagategeometric errorsthroughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structureddenoising dynamicsto orchestrate a coordinated progression from geometry to appearance within thediffusion process.MoCam first leveragesgeometric priorsin early stages to anchor coarse structures and tolerate their incompleteness, then switches toappearance priorsin later stages to actively correctgeometric errorsand refine details. This design naturally unifies static and dynamicview synthesisby temporally decoupling geometric alignment andappearance refinementwithin thediffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly whenpoint cloudscontain severe holes or distortions, achieving robust geometry-appearance disentanglement.

View arXiv pageView PDFProject pageAdd to collection

Get this paper in your agent:

hf papers read 2605\.12119

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.12119 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.12119 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.12119 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

sensenova/SenseNova-U1-8B-MoT

Hugging Face Models Trending

SenseNova U1 is a new series of native multimodal models that unify understanding and generation within a single architecture using the NEO-Unify framework, eliminating the need for separate visual encoders or VAEs.