FACTS: Table Summarization via Offline Template Generation with Agentic Workflows

arXiv cs.CL Papers

Summary

FACTS introduces an agentic workflow for query-focused table summarization that generates reusable offline templates combining SQL queries and Jinja2 templates, enabling fast, accurate, and privacy-compliant summarization without exposing sensitive data. The approach outperforms existing baselines by avoiding costly fine-tuning and token-limit issues while maintaining scalability across tables with shared schemas.

arXiv:2510.13920v2 Announce Type: replace Abstract: Query-focused table summarization requires generating natural language summaries of tabular data conditioned on a user query, enabling users to access insights beyond fact retrieval. Existing approaches face key limitations: table-to-text models require costly fine-tuning and struggle with complex reasoning, prompt-based LLM methods suffer from token-limit and efficiency issues while exposing sensitive data, and prior agentic pipelines often rely on decomposition, planning, or manual templates that lack robustness and scalability. To mitigate these issues, we introduce an agentic workflow, FACTS, a Fast, Accurate, and Privacy-Compliant Table Summarization approach via Offline Template Generation. FACTS produces offline templates, consisting of SQL queries and Jinja2 templates, which can be rendered into natural language summaries and are reusable across multiple tables sharing the same schema. It enables fast summarization through reusable offline templates, accurate outputs with executable SQL queries, and privacy compliance by sending only table schemas to LLMs. Evaluations on widely-used benchmarks show that FACTS consistently outperforms baseline methods, establishing it as a practical solution for real-world query-focused table summarization. Our code is available at https://github.com/BorealisAI/FACTS.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:31 AM

# FACTS: Table Summarization via Offline Template Generation with Agentic Workflows

Source: https://arxiv.org/html/2510.13920

Ye Yuan
McGill University
Mila - Quebec AI Institute

Mohammad Amin Shabani
RBC Borealis

Siqi Liu
RBC Borealis

###### Abstract

Query-focused table summarization requires generating natural language summaries of tabular data conditioned on a user query, enabling users to access insights beyond fact retrieval. Existing approaches face key limitations: table-to-text models require costly fine-tuning and struggle with complex reasoning, prompt-based LLM methods suffer from token-limit and efficiency issues while exposing sensitive data, and prior agentic pipelines often rely on decomposition, planning, or manual templates that lack robustness and scalability. To mitigate these issues, we introduce an agentic workflow, **FACTS**, a **F**ast, **A**ccurate, and **C**ompliance-Compliant **T**able **S**ummarization approach via Offline Template Generation. FACTS produces *offline templates*, consisting of SQL queries and Jinja2 templates, which can be rendered into natural language summaries and are reusable across multiple tables sharing the same schema. It enables fast summarization through reusable offline templates, accurate outputs with executable SQL queries, and privacy compliance by sending only table schemas to LLMs. Evaluations on widely-used benchmarks show that FACTS consistently outperforms baselines, establishing it as a practical solution for real-world query-focused table summarization. Our code is available at https://github.com/BorealisAI/FACTS.

---

## 1 Introduction

**Figure 1:** Comparison between DirectSumm (Zhang et al., 2024) (left) and our proposed FACTS framework (right). DirectSumm prompts a large language model (LLM) with the full table and query, which may produce hallucinated values, exposes all table records to external services, and requires regeneration for each new table even under the same schema and query. In contrast, FACTS generates a reusable offline template consisting of schema-aware SQL queries and a Jinja2 template. The SQL queries retrieve precise values through execution, while the Jinja2 template renders natural language summaries, ensuring accuracy, reusability, scalability, and privacy compliance.

Query-focused table summarization requires generating natural language summaries of tabular data conditioned on a user query, enabling users to access insights that go beyond fact retrieval (Zhao et al., 2023). Unlike generic table summarization (Lebret et al., 2016; Moosavi et al., 2021), which aims to capture all salient table content, query-focused summarization adapts to diverse user intents. Compared with table question answering (Pasupat and Liang, 2015; Nan et al., 2022), which typically returns short factoid answers, query-focused summarization demands richer reasoning and explanatory narratives. This distinction is especially critical in real-world domains such as finance, healthcare, and law, where professionals rely on customized summaries for decision-making. For instance, in a financial institution, analysts may request gross income summaries, one for each year over the past ten years, providing a user query as in Figure 1 (top left).

We argue that a practical solution must handle large datasets efficiently, support reusability, ensure correctness of outputs, and protect sensitive information. These four properties are essential for query-focused table summarization methods in practice. First, the method must be fast, enabling reusability across tables with the same schema and scalability to very large tables without passing all rows to language models. Second, it must be accurate, grounding summaries in executable operations rather than free-form text generation. Third, it must be privacy-compliant, since regulations such as HIPAA and GDPR prohibit exposing individual-level records to external LLM services. In many cases, only exposing the user queries or the schema of tables is acceptable. Yet existing approaches fall short.

Table-to-text models (Liu et al., 2022b; Zhao et al., 2022; Jiang et al., 2022) require costly fine-tuning and still struggle with numerical reasoning and logical fidelity. Prompt-based methods (Zhao et al., 2023; Zhang et al., 2024) directly query powerful LLMs but suffer from token-limit and efficiency issues while exposing sensitive data from the tables. Prevalent agentic frameworks (Cheng et al., 2023; Ye et al., 2023; Zhao et al., 2024; Zhang et al., 2025) mitigate some challenges by grounding outputs in SQL or Python execution, but most rely on decomposition, natural language planning, or manual template design, which lack robustness and scalability. Returning to our previous example, an approach such as DirectSumm would require ten separate LLM generations for ten yearly tables, with all values revealed to the model, leading to inefficiency and privacy risks, as illustrated in Figure 1 (left).

To address these challenges, we introduce **FACTS**, a **F**ast, **A**ccurate, and **C**ompliance-**C**ompliant **T**able **S**ummarization approach via Offline Template Generation. FACTS employs an agentic workflow with three stages. First, it generates schema-aware guided questions and filtering rules to clarify user query intent. Second, it synthesizes SQL queries to extract relevant information from tables. Third, it produces a Jinja2 template to render SQL outputs into natural language. Crucially, FACTS integrates an LLM Council, an ensemble of LLMs iteratively validating and refining outputs at each stage. This feedback loop ensures correctness, consistency, and usability of the generated artifacts. The final product, an offline template composed of SQL queries and a Jinja2 template, can be reused across any tables with the same schema for a given query. Returning to our example, an offline template produced by FACTS can summarize gross income across ten yearly tables, avoiding repeated LLM calls while ensuring accurate and privacy-compliant outputs (Figure 1 (right)).

To the best of our knowledge, FACTS introduces the first agentic framework that automates offline template generation for query-focused table summarization. We evaluate FACTS on three public benchmarks: FeTaQA (Nan et al., 2022), QTSumm (Zhao et al., 2023), and QFMTS (Zhang et al., 2024). Experimental results show that FACTS consistently outperforms representative baselines, demonstrating its practicality for real-world query-focused table summarization.

In summary, our contributions are as follows:

(1) We propose offline template generation, which produces reusable and schema-specific templates in a privacy-compliant manner, enabling scalability to large tables and efficiency across recurring queries.

(2) We design FACTS, an agentic workflow that integrates guided question generation, SQL synthesis, and Jinja2 rendering, supported by iterative feedback loops to ensure correctness.

(3) We demonstrate the practicality of FACTS through comprehensive experiments on FeTaQA, QTSumm, and QFMTS, showing promising improvements over representative baselines.

## 2 Related Work

This section reviews prior work related to our study. We first situate query-focused table summarization within the broader landscape of table summarization and question answering. We then survey existing approaches and compare these paradigms against our proposed framework.

#### Query-Focused Table Summarization

Research on table-to-text generation has primarily aimed at transforming structured tables into natural language statements or summaries (Parikh et al., 2020; Chen et al., 2020; Cheng et al., 2022b; Lebret et al., 2016; Moosavi et al., 2021; Suadaa et al., 2021). These works typically target either single-sentence descriptions or domain-specific summaries, with the main goal of improving fluency and factual consistency. However, such outputs are not tailored to a user's specific information needs. In contrast, table question answering (Pasupat and Liang, 2015; Iyer et al., 2017; Nan et al., 2022) has focused on answering precise fact-based queries, usually returning short values or entities. While table question answering captures query intent, it lacks the ability to provide longer-form reasoning or explanatory summaries. To address this gap, Zhao et al. (2023) introduced the task of query-focused table summarization, where a model generates a narrative-style summary conditioned on both the table and a user query. Compared to generic table summarization, query-focused table summarization explicitly accounts for diverse user intents, and compared to table question answering, it produces extended summaries rather than minimal answers.

#### Existing Approaches

Existing work can be broadly grouped into three categories.

**(1) Table-to-text models** adapt language models to better capture table structure and reasoning. TAPEX (Liu et al., 2022b) extends BART with large-scale synthetic SQL execution data, improving compositional reasoning. ReasTAP (Zhao et al., 2022) follows a similar idea but uses synthetic QA corpora to enhance logical understanding. OmniTab (Jiang et al., 2022) combines both natural and synthetic QA signals for more robust pretraining. FORTAP (Chen et al., 2022a) leverages spreadsheet formulas as supervision to strengthen numerical reasoning. PLOG (Liu et al., 2022a) introduces a two-stage strategy: first generating logical forms from tables, then converting them into natural language, to improve logical faithfulness in summaries.

**(2) Prompt-based models** instead rely directly on large language models (LLMs) with carefully designed prompting. ReFactor (Zhao et al., 2023) extracts query-relevant facts and concatenates them with the query to guide generation. DirectSumm (Zhang et al., 2024) produces summaries in a single step, synthesizing text directly from the table and query. Reason-then-Summ (Zhang et al., 2024) decomposes the task into two stages, first retrieving relevant facts and then composing longer summaries.

**(3) Agentic frameworks** use external tools such as SQL or Python to ensure accuracy. Binder (Cheng et al., 2023) translates the input query into executable programs, often SQL, to ground results in computation. Dater (Ye et al., 2023) decomposes complex queries into smaller sub-queries, executes them individually, and aggregates their outputs. TaPERA (Zhao et al., 2024) builds natural language plans that are converted into Python programs for execution before aggregation. SPaGe (Zhang et al., 2025) moves beyond free-form plans by introducing structured representations and graph-based execution, improving reliability in multi-table scenarios.

Table 2 in Appendix A contrasts our proposed FACTS with representative methods using four criteria. *Reusable*: artifacts applicable to new tables with the same schema; *Scalable*: ability to handle very large tables without feeding all rows; *Accurate*: correctness via executable programs; *Privacy-Compliant*: avoiding exposure of raw table content to LLMs. Most prior methods fall short on one or more dimensions: table-to-text and prompt-based models lack all four; agentic frameworks improve accuracy but sacrifice scalability and privacy; and plan-based methods, such as TaPERA and SPaGe, yield only partially reusable plans. FACTS is the only approach satisfying all four desired properties.

## 3 Methodology

**Example 1:** An offline template generated by FACTS on the QFMTS dataset (Zhang et al., 2024). The SQL query retrieves the top three accounts by savings balance, and the Jinja2 template renders the results into natural language.

**SQL Queries:**
```sql
SELECT a."name", s."balance"
FROM "ACCOUNTS" a
JOIN "SAVINGS" s ON CAST(a."custid" AS DOUBLE) = s."custid"
ORDER BY s."balance" DESC, a."name" ASC
LIMIT 3;
```

**Jinja2 Template:**
```
{% if values and values|length > 0 %}
The three accounts with the highest savings balances are:
{% for row in values %}
- {{ row["name"] }} with a savings balance of {{ row["balance"] }}.
{% endfor %}
Overall, these represent the top savers by balance in the dataset.
{% else %}
No results were found for the requested top savings accounts.
{% endif %}
```

To avoid ambiguity, we first clarify the terminology used in this section. A **user query** denotes the natural language input provided by the user, which specifies an information need over one or more tables and may include rich contextual details. An **SQL query** refers to executable code generated by our method to retrieve the information required to satisfy the user query. A **Jinja2 template** is a rendering program that verbalizes SQL outputs into natural language. An **offline template** is the composite artifact introduced in this work, bundling one or more SQL queries together with a Jinja2 template. Unless otherwise specified, the term **schema** refers to the structural metadata of the table, e.g., column names and data types, rather than raw values. Finally, a **summary** denotes the final natural language output returned to the user after executing the SQL queries and rendering the Jinja2 template.

The remainder of this section is structured as follows: Section 3.1 introduces the concept of offline templates and motivates their reusability; Section 3.2 details the LLM Council, which provides iterative validation and feedback; and Section 3.3 presents the complete FACTS framework and its three modules.

### 3.1 Offline Template

Similar Articles

Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets

arXiv cs.CL

This paper proposes an evidence-based model to automatically generate query keywords from query-free summarization datasets, enabling the creation of query-focused summarization datasets. Experimental results show that summaries generated using evidence-based queries achieve competitive ROUGE scores compared to original queries.

AICW Summarize Widget

Product Hunt

AICW Summarize Widget lets website visitors summarize page content using their preferred AI tool. It is a simple embeddable widget aimed at improving content accessibility.

Summarizing books with human feedback

OpenAI Blog

OpenAI presents a scalable alignment technique using hierarchical summarization of entire books with human feedback, demonstrating how models can be trained to act in accordance with human intentions on complex, difficult-to-evaluate tasks.

Good Summarization SLMs for < 2000 tokens

Reddit r/LocalLLaMA

A novice asks for recommendations on small language models and prompting strategies to build an employee note summarization engine under 2000 tokens, after experiencing hallucinations with Qwen2.5-7B-Instruct.