JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Hugging Face Daily Papers Papers

Summary

JanusMesh is a fast, training-free framework that generates text-driven 3D visual illusions—a single mesh revealing different semantics from different viewing angles—by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis, achieving high realism in just 3-5 minutes.

Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geometrically coherent objects. This results in visible unnatural seams and semantic leaks. In this paper, we present a fast and training-free framework for generating text-driven 3D visual illusions. Our approach decouples the generation into two stages. First, we propose a cross-space dual-branch denoising process. This process dynamically decodes 3D latents into voxel space for CLIP-guided orientation alignment and Signed Distance Field (SDF) blending, which ensures seamless geometric fusion. Second, we introduce a view-conditioned texture synthesis module that projects and aggregates view-specific 2D diffusion priors onto the fused geometry. Extensive experiments demonstrate that our method generates highly realistic, dual-semantic 3D illusions in just 3-5 minutes. It significantly outperforms existing methods in geometric integrity, semantic recognizability, and efficiency. Project page: https://siang1105.github.io/JanusMesh.github.io/
Original Article
View Cached Full Text

Cached at: 06/20/26, 02:28 PM

Paper page - JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Source: https://huggingface.co/papers/2606.20563

Abstract

A fast, training-free framework generates text-driven 3D visual illusions by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis for seamless geometric fusion and semantic coherence.

Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geometrically coherent objects. This results in visible unnatural seams and semantic leaks. In this paper, we present a fast and training-free framework for generating text-driven 3D visual illusions. Our approach decouples the generation into two stages. First, we propose across-space dual-branch denoising process. This process dynamically decodes3D latentsintovoxel spaceforCLIP-guided orientation alignmentandSigned Distance Field(SDF) blending, which ensures seamlessgeometric fusion. Second, we introduce aview-conditioned texture synthesismodule that projects and aggregates view-specific2D diffusion priorsonto the fused geometry. Extensive experiments demonstrate that our method generates highly realistic, dual-semantic 3D illusions in just 3-5 minutes. It significantly outperforms existing methods in geometric integrity, semantic recognizability, and efficiency. Project page: https://siang1105.github.io/JanusMesh.github.io/

View arXiv pageView PDFProject pageGitHub16Add to collection

Get this paper in your agent:

hf papers read 2606\.20563

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.20563 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.20563 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.20563 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

UniMesh: Unifying 3D Mesh Understanding and Generation

Hugging Face Daily Papers

UniMesh introduces a single model that jointly handles 3D mesh generation and understanding via a Mesh Head, Chain-of-Mesh iterative editing, and a self-reflection error-correction mechanism.

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

Hugging Face Daily Papers

A training-free 4D mesh generation approach using Spatio-Temporal Attention Chains accelerates creation to 9 seconds (13x speedup) while improving temporal consistency and scaling to longer sequences, with zero-shot capabilities for tracking and camera estimation.

Helix4D: Complex 4D Mesh Generation

Hugging Face Daily Papers

Helix4D introduces a framework for high-quality dynamic 4D mesh generation from video by extending Trellis2 with cross-frame attention and a 4D temporal encoding that repurposes redundant spatial RoPE bands without adding parameters.