SENSE: Satellite-based ENergy Synthesis for Sustainable Environment
Summary
SENSE is a generative urban building energy modeling framework that synthesizes satellite imagery and energy data using diffusion models, achieving high-fidelity results with reduced labeled data requirements.
View Cached Full Text
Cached at: 05/21/26, 06:20 AM
Paper page - SENSE: Satellite-based ENergy Synthesis for Sustainable Environment
Source: https://huggingface.co/papers/2605.18101
Abstract
SENSE is a generative urban building energy modeling framework that synthesizes satellite imagery and energy data using diffusion models, achieving high-fidelity results with reduced labeled data requirements.
Urban Building Energy Modelingplays a critical role in achieving the United Nations’ Sustainable Development Goals 7 and 11. Although existing studies based onsatellite imageryand deep learning have achieved remarkable progress, many challenges exist: most existing studies are inherently predictive, failing to reflect the generative nature of urban planning; althoughgenerative AIanddiffusion modelshave seen explosive growth insatellite imagery, they lack the urban functional generation (e.g., energy layer); third, aligned high-quality high-resolution building energy data withsatellite imageryis limited and scarce. Here we propose SENSE (Satellite-based ENergy Synthesis for Sustainable Environment), a unified generative UBEM framework that jointly synthesizes realistic urbansatellite imageryand aligned high-quality building energy consumption and height maps. By conditioning on road networks and urban density metrics, SENSE, based on acontrollable diffusion model, leverages the knowledge learned by large vision models to generate urban building energy consumption and height information (annotations) in thelatent space. Experiments across four cities (New York City, Boston, Lyon, Busan) demonstrate that SENSE achieves high visual fidelity and strong physical consistency, satisfying theASHRAE standardmetric. Experiments demonstrate that SENSE can generate enough annotated synthetic data using less than 20% labeled energy data, boosting downstream prediction performance by 10% IoU. Compared to SOTA urban energy prediction methods, SENSE significantly reduced prediction error (reduced 3%-11%NMBEand 1%-9%CVRMSE). This study offers an energy-efficiency urban planning and physical generation solution for urban science, energy science and building science. The dataset and code: https://huggingface.co/datasets/skl24/MUSE and https://github.com/kailaisun/GenAI4Urban-Energy/.
View arXiv pageView PDFProject pageGitHub1Add to collection
Get this paper in your agent:
hf papers read 2605\.18101
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### skl24/SENSE Updatedabout 5 hours ago
Datasets citing this paper1
#### skl24/MUSE Updatedabout 5 hours ago • 11
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.18101 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Ensemble Score Filtering for Real-Data Energy Consumption Forecast Correction
This paper proposes using the Ensemble Score Filter (EnSF), a score-based diffusion data assimilation method, to correct forecasts from a pretrained spatio-temporal energy consumption model using noisy partial observations. Numerical experiments show EnSF significantly improves state estimation over open-loop propagation and outperforms the Ensemble Kalman Filter under nonlinear observations.
FusionSense: Tri-Stage Near-Sensor Learning for Runtime-Adaptive Multimodal Edge Intelligence
FusionSense introduces a tri-stage near-sensor learning framework for multimodal edge intelligence that jointly reduces compute and communication by using fusion-aware filtering, achieving up to 33× energy savings and significant data-reduction gains on RGB-Depth/LiDAR tasks.
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Sat3DGen introduces a geometry-first approach for generating street-level 3D scenes from a single satellite image, achieving improved geometric accuracy and photorealism through novel constraints and training strategies. The method demonstrates significant improvements over prior work on the VIGOR-OOD benchmark.
Efficient Image Synthesis with Sphere Latent Encoder
This paper proposes Sphere Latent Encoder, an efficient few-step image generation framework that performs denoising entirely in a spherical latent space, achieving high-quality 256×256 images with significantly reduced computational cost and improved FID scores on ImageNet-1K.
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers
SEGA is a training-free method that improves high-resolution text-to-image generation by adaptively scaling attention across RoPE components based on spatial-frequency structure during denoising steps.