MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics
Summary
MoCam is a research paper introducing a diffusion-based framework for unified novel view synthesis that dynamically coordinates geometric and appearance priors to improve robustness against geometric errors.
View Cached Full Text
Cached at: 05/13/26, 04:12 AM
Paper page - MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics
Source: https://huggingface.co/papers/2605.12119
Abstract
MoCam addresses the challenge of generative novel view synthesis by dynamically coordinating geometric and appearance priors through structured denoising dynamics within a diffusion framework.
Generative novelview synthesisfaces a fundamental dilemma:geometric priorsprovide spatial alignment but become sparse and inaccurate under view changes, whileappearance priorsoffer visual fidelity but lack geometric correspondence. Existing methods either propagategeometric errorsthroughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structureddenoising dynamicsto orchestrate a coordinated progression from geometry to appearance within thediffusion process.MoCam first leveragesgeometric priorsin early stages to anchor coarse structures and tolerate their incompleteness, then switches toappearance priorsin later stages to actively correctgeometric errorsand refine details. This design naturally unifies static and dynamicview synthesisby temporally decoupling geometric alignment andappearance refinementwithin thediffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly whenpoint cloudscontain severe holes or distortions, achieving robust geometry-appearance disentanglement.
View arXiv pageView PDFProject pageAdd to collection
Get this paper in your agent:
hf papers read 2605\.12119
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.12119 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.12119 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.12119 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
The article discusses the UniVidX paper, which introduces a unified multimodal framework for video generation using diffusion priors and discusses its cross-modal coherence mechanisms.
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
AnyRecon proposes a scalable framework for 3D reconstruction from arbitrary sparse inputs using a video diffusion model with persistent scene memory and geometry-aware conditioning.
sensenova/SenseNova-U1-8B-MoT
SenseNova U1 is a new series of native multimodal models that unify understanding and generation within a single architecture using the NEO-Unify framework, eliminating the need for separate visual encoders or VAEs.
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
MoCapAnything V2 introduces a fully end-to-end framework for arbitrary-skeleton motion capture from monocular video, jointly optimizing video-to-pose and pose-to-rotation predictions to resolve rotation ambiguity.
ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis
ReImagine introduces an image-first approach to controllable high-quality human video generation, combining SMPL-X motion guidance with video diffusion models to decouple appearance from temporal consistency.