The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

Hugging Face Daily Papers Papers

Summary

KITScenes Multimodal is a high-fidelity European autonomous driving dataset with synchronized sensors, complete 3D HD maps, and four benchmarks for spatial learning and embodied AI research.

Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization. Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes.com/
Original Article
View Cached Full Text

Cached at: 06/05/26, 10:10 PM

Paper page - The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

Source: https://huggingface.co/papers/2606.02956

Abstract

KITScenes Multimodal dataset provides high-fidelity European driving data with comprehensive 3D maps and diverse urban environments for embodied AI research.

Existingautonomous driving datasetshave enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built aroundhigh-fidelity sensorsand maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m,4D imaging radar, and redundantGNSS/INS localization. OurHD mapsare, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancingspatial learningforembodied AI:online HD map construction,long-range depth estimation,novel view synthesis, andend-to-end driving. Project page: https://kitscenes.com/

View arXiv pageView PDFProject pageAdd to collection

Get this paper in your agent:

hf papers read 2606\.02956

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.02956 in a model README.md to link it from this page.

Datasets citing this paper1

#### KIT-MRT/KITScenes-Multimodal Updated2 days ago • 632 • 11

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.02956 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles