JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising
Summary
JanusMesh is a fast, training-free framework that generates text-driven 3D visual illusions—a single mesh revealing different semantics from different viewing angles—by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis, achieving high realism in just 3-5 minutes.
View Cached Full Text
Cached at: 06/20/26, 02:28 PM
Paper page - JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising
Source: https://huggingface.co/papers/2606.20563
Abstract
A fast, training-free framework generates text-driven 3D visual illusions by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis for seamless geometric fusion and semantic coherence.
Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geometrically coherent objects. This results in visible unnatural seams and semantic leaks. In this paper, we present a fast and training-free framework for generating text-driven 3D visual illusions. Our approach decouples the generation into two stages. First, we propose across-space dual-branch denoising process. This process dynamically decodes3D latentsintovoxel spaceforCLIP-guided orientation alignmentandSigned Distance Field(SDF) blending, which ensures seamlessgeometric fusion. Second, we introduce aview-conditioned texture synthesismodule that projects and aggregates view-specific2D diffusion priorsonto the fused geometry. Extensive experiments demonstrate that our method generates highly realistic, dual-semantic 3D illusions in just 3-5 minutes. It significantly outperforms existing methods in geometric integrity, semantic recognizability, and efficiency. Project page: https://siang1105.github.io/JanusMesh.github.io/
View arXiv pageView PDFProject pageGitHub16Add to collection
Get this paper in your agent:
hf papers read 2606\.20563
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.20563 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.20563 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.20563 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
UniMesh: Unifying 3D Mesh Understanding and Generation
UniMesh introduces a single model that jointly handles 3D mesh generation and understanding via a Mesh Head, Chain-of-Mesh iterative editing, and a self-reflection error-correction mechanism.
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
HY-World 2.0 is a multi-modal world model framework that generates high-fidelity 3D Gaussian Splatting scenes from text, images, and videos through specialized modules for panorama generation, trajectory planning, and scene composition, achieving state-of-the-art performance among open-source approaches.
Fast 4D Mesh Generation by Spatio-Temporal Attention Chains
A training-free 4D mesh generation approach using Spatio-Temporal Attention Chains accelerates creation to 9 seconds (13x speedup) while improving temporal consistency and scaling to longer sequences, with zero-shot capabilities for tracking and camera estimation.
MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation
MeshWeaver presents an autoregressive mesh generation framework that directly predicts vertices using a multi-level sparse-voxel encoder, achieving state-of-the-art compression and geometric fidelity for high-poly meshes.
Helix4D: Complex 4D Mesh Generation
Helix4D introduces a framework for high-quality dynamic 4D mesh generation from video by extending Trellis2 with cross-frame attention and a 4D temporal encoding that repurposes redundant spatial RoPE bands without adding parameters.