Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories
Summary
This paper introduces Data Journalist Agent (Data2Story), a multi-agent framework that automates data journalism by generating evidence-grounded, multimodal news stories while ensuring transparency and verifiability.
View Cached Full Text
Cached at: 06/10/26, 05:44 AM
Paper page - Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories
Source: https://huggingface.co/papers/2606.11176
Abstract
A multi-agent framework automates data journalism by generating evidence-grounded, multimodal news stories while maintaining transparency and verifiability.
Data tells stories that shape society; the data journalist’s job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team weeks: hunting for context, running statistics, choosing an angle, and designing visuals. Recent agents handle individual steps well: data-science agents close the analysis loop, while design agents synthesize beautiful websites. But can an agent serve as a data journalist end to end? We introduce Data Journalist Agent (Data2Story), amulti-agent frameworkthat orchestrates specialized roles into a single virtual newsroom. Data2Story contributes two innovations. (i) Claims areevidence-grounded: an Inspector links every number, angle, and asset back to data, code, or an external reference. (ii) Articles are multimodally generative: rather than defaulting to plain text and static charts, Data2Story reasons about what readers will want to see, then deploys multimodal tools, such as interactive maps for geography and audio for music. We evaluate Data2Story on 18 articles, each paired with the originally published expert piece, along four axes: (a) human-agent angle coverage; (b) rubric evaluation with 53 participants across five dimensions; (c) computer-use agents as judges, a cost-saving proxy for how readers navigate interactive articles; and (d)verifiability, where a coding verifier re-executes statements against the data and checks claims against references. Data2Story produces competitive, evidence-traceable multimedia stories, with particular strength in transparency and auditability. Human articles retain an edge in editorial angle, creative design, and presentation. We position Data2Story as a collaborator for journalists, enabling more evidence-based, transparent, and verifiable reporting. Code and demos are available at https://data2story.github.io.
View arXiv pageView PDFProject pageGitHub12Add to collection
Get this paper in your agent:
hf papers read 2606\.11176
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.11176 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.11176 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.11176 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation
This paper presents Ptah, a multi-agent harness for generating verifiable multimodal deep research reports by interleaving textual and visual evidence through specialized agents and verification mechanisms. It introduces PtahEval for evaluation.
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents
This paper introduces On-Policy Data Evolution (ODE) and a visual-native agent harness to improve multimodal deep search agents. By enabling reusable visual evidence and closed-loop data generation, ODE significantly boosts the performance of Qwen3-VL agents across multiple benchmarks, surpassing Gemini 2.5 Pro.
DataArc-SynData-Toolkit: A Unified Closed-Loop Framework for Multi-Path, Multimodal, and Multilingual Data Synthesis
The article introduces DataArc-SynData-Toolkit, an open-source framework designed to simplify multi-path, multimodal, and multilingual synthetic data generation. It aims to lower technical barriers and improve usability for training large language models through a unified, configuration-driven pipeline.
PresentAgent-2: Towards Generalist Multimodal Presentation Agents
PresentAgent-2 is an agentic framework that generates presentation videos from user queries by conducting research, creating multimodal slides, and producing interactive content across single, discussion, and interaction modes.
Traxia: A Framework for Verifiable, Agent-Native Scientific Publishing
Traxia introduces a framework for verifiable, agent-native scientific publishing where autonomous AI agents publish, peer-review, and collaborate with humans, addressing reproducibility and provenance issues.