TAROT: Task-Adaptive Refinement of LLM-prior Graphs for Few-shot Tabular Learning

arXiv cs.LG Papers

Summary

TAROT proposes a GNN-based framework that leverages LLMs to construct and refine task-adaptive semantic graphs for few-shot tabular learning, achieving state-of-the-art performance.

arXiv:2606.11640v1 Announce Type: new Abstract: Few-shot tabular learning provides a cost-effective approach for real-world applications where annotation is costly and collecting sufficient samples for new tasks is difficult. Existing Traditional and LLM-based methods have demonstrated effectiveness in few-shot scenarios. However, traditional methods need additional training on unlabeled or generated data, which incur significant computational overhead. In addition, LLM-based methods that directly feed raw tabular data into LLMs raise privacy and compliance concerns. More importantly, both paradigms largely overlook the semantic relationships between features, which provide structural and semantic prior for constructing a semantic graph. Semantic graph is essential for modeling meaningful feature interactions in few-shot scenarios. In this paper, we propose TAROT, a GNN-based framework that encodes the structural and semantic prior by constructing and refining a task-adaptive semantic graph from this prior, thereby improving predictive performance in few-shot tabular learning. TAROT first encodes heterogeneous tabular data into unified node semantic representations via a Unified Semantic Tabular Node Encoder (USTNE). Then, it prompts LLMs to infer the semantic relationship between features based on the task description and feature names to construct a semantic graph. To mitigate structural noise introduced by the hallucination of LLMs, TAROT introduces Task-adaptive Semantic Graph Refinement that prunes spurious or task-unrelated edges and adds missing task-related ones, aligning the graph structure with the downstream objective. Finally, a GNN performs message passing over the refined graph to capture task-related semantic dependencies for prediction. Extensive experiments on various few-shot tabular learning benchmarks demonstrate the superior performance of TAROT, establishing it as a state-of-the-art approach in this domain.
Original Article
View Cached Full Text

Cached at: 06/11/26, 01:50 PM

# TAROT: Task-Adaptive Refinement of LLM-prior Graphs for Few-shot Tabular Learning
Source: [https://arxiv.org/html/2606.11640](https://arxiv.org/html/2606.11640)
\(2026\)

###### Abstract\.

Few\-shot tabular learning provides a cost\-effective approach for real\-world applications where annotation is costly and collecting sufficient samples for new tasks is difficult\. Existing Traditional and LLM\-based methods have demonstrated effectiveness in few\-shot scenarios\. However, traditional methods need additional training on unlabeled or generated data, which incur significant computational overhead\. In addition, LLM\-based methods that directly feed raw tabular data into LLMs raise privacy and compliance concerns\. More importantly, both paradigms largely overlook the semantic relationships between features, which provide structural and semantic prior for constructing asemantic graph\. Semantic graph is essential for modeling meaningful feature interactions in few\-shot scenarios\. In this paper, we propose TAROT, a GNN\-based framework that encodes the structural and semantic prior by constructing and refining a task\-adaptive semantic graph from this prior, thereby improving predictive performance in few\-shot tabular learning\. TAROT first encodes heterogeneous tabular data into unified node semantic representations via a Unified Semantic Tabular Node Encoder \(USTNE\)\. Then, it prompts LLMs to infer the semantic relationship between features based on the task description and feature names to construct a semantic graph\. To mitigate structural noise introduced by the hallucination of LLMs, TAROT introduces Task\-adaptive Semantic Graph Refinement that prunes spurious or task\-unrelated edges and adds missing task\-related ones, aligning the graph structure with the downstream objective\. Finally, a GNN performs message passing over the refined graph to capture task\-related semantic dependencies for prediction\. Extensive experiments on various few\-shot tabular learning benchmarks demonstrate the superior performance of TAROT, establishing it as a state\-of\-the\-art approach in this domain\.

Few\-shot Tabular Learning, Graph Structure Learning, Large Language Models \(LLMs\)

††journalyear:2026††copyright:cc††conference:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2; August 9–13, 2026; Jeju Island, Republic of Korea\.††booktitle:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2 \(KDD 2026\), August 9–13, 2026, Jeju Island, Republic of Korea††isbn:979\-8\-4007\-2259\-2/2026/08††doi:10\.1145/3770855\.3817944††ccs:Mathematics of computing Graph algorithms††ccs:Computing methodologies Machine learning††ccs:Information systems Data mining![Refer to caption](https://arxiv.org/html/2606.11640v1/x1.png)Figure 1\.Semantic graph construction on Adult dataset\. \(a\) Semantic relationships between features on Tabular Data\. \(b\) Semantic Graph for modeling meaningful feature interactions in few\-shot scenarios\.## 1\.Introduction

Table 1\.Existing few\-shot tabular learning methods can be broadly categorized into Traditional and LLM\-based approaches\. Most existing methods overlook the semantic relationships between features, and many of them are restricted to classification\-only settings\.PropertiesTraditional Few\-shot Tabular Learning MethodsLLM\-based Few\-shot Tabular Learning MethodsTAROT \(Ours\)SCARF\(Bahriet al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib22)\)TabPFN\(Hollmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib35)\)STUNT\(Namet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib23)\)In\-context\(Weiet al\.,[2022](https://arxiv.org/html/2606.11640#bib.bib36)\)TABLET\(Slack and Singh,[2023](https://arxiv.org/html/2606.11640#bib.bib24)\)TabLLM\(Hegselmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib25)\)FeatLLM\(Hanet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib26)\)No additional training required✗\(Contrastive Learning\)✗\(Supervised Learning\)✗\(Meta learning\)\\cellcolorred\!20✔\\cellcolorred\!20✔✗\(Fine\-tune LLM\)\\cellcolorred\!20✔\\cellcolorred\!20✔No LLM access sample required\\cellcolorred\!20✔\\cellcolorred\!20✔\\cellcolorred\!20✔✗✗✗✗\\cellcolorred\!20✔Classification task\\cellcolorred\!20✔\\cellcolorred\!20✔\\cellcolorred\!20✔\\cellcolorred\!20✔\\cellcolorred\!20✔\\cellcolorred\!20✔\\cellcolorred\!20✔\\cellcolorred\!20✔Regression task\\cellcolorred\!20✔✗✗\\cellcolorred\!20✔✗✗✗\\cellcolorred\!20✔Semantic relationships between features✗✗✗✗✗✗✗\\cellcolorred\!20✔

Given the substantial financial and temporal costs of sample annotation\(Clementset al\.,[2020](https://arxiv.org/html/2606.11640#bib.bib65); Namet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib23)\)and the difficulty of collecting data for new tasks \(e\.g\., some rare or new diseases\)\(Mitani and Haneuse,[2020](https://arxiv.org/html/2606.11640#bib.bib66); Mondalet al\.,[2020](https://arxiv.org/html/2606.11640#bib.bib67)\), learning from a limited number of labeled samples has emerged as a cost\-effective solution for real\-world deployment of machine learning models\(Snellet al\.,[2017](https://arxiv.org/html/2606.11640#bib.bib8); Wanget al\.,[2020](https://arxiv.org/html/2606.11640#bib.bib9); Oreshkinet al\.,[2018](https://arxiv.org/html/2606.11640#bib.bib88); Wanget al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib10)\)\. This scenario, commonly referred to asfew\-shot learning, has recently attracted increasing attention across multiple domains, including computer vision\(Chenet al\.,[2019](https://arxiv.org/html/2606.11640#bib.bib89); Penget al\.,[2019](https://arxiv.org/html/2606.11640#bib.bib93)\)and tabular learning\(Namet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib23); Hanet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib26)\)\. However, with insufficient supervisory signals, traditional supervised learning struggles to learn effective models, as its performance heavily relies on statistical convergence over large labeled datasets\. This limitation is particularly pronounced in tabular learning, where labeled data are often scarce\(Liuet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib90)\)in real\-world applications, such as fraud detection\(Cao,[2022](https://arxiv.org/html/2606.11640#bib.bib91)\)and disease diagnosis\(Shailajaet al\.,[2018](https://arxiv.org/html/2606.11640#bib.bib92)\)\.

To tackle such limited label issues, existing few\-shot learning approaches for tabular data can be broadly classified into two categories\.Traditional methodsaim to acquire transferable representations or useful knowledge by additional training on large\-scale unlabeled or synthetic tabular data\. For instance, SCARF\(Bahriet al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib22)\)and STUNT\(Namet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib23)\)leverage unlabeled tabular data to respectively learn a generalizable and adaptable representation, while TabPFN\(Hollmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib35)\)is trained on large\-scale generated datasets to incorporate prior knowledge over feature–label relationships, enabling rapid adaptation in few\-shot scenarios\. In contrast,LLM\-based methodstransform raw tabular samples into natural language representations and exploit the inherent knowledge of LLMs\(Yaoet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib94); Bubecket al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib96)\)for in\-context reasoning\(Weiet al\.,[2022](https://arxiv.org/html/2606.11640#bib.bib36); Slack and Singh,[2023](https://arxiv.org/html/2606.11640#bib.bib24)\), feature importance estimation\(Hanet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib26)\)\. Furthermore, it uses task\-specific fine\-tuning to improve LLM capabilities for tabular understanding and downstream performance\(Hegselmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib25)\)\.

Despite recent advances in these approaches, they still suffer from some limitations that hinder their effectiveness and scalability in real\-world deployment, as summarized in Tab\.[1](https://arxiv.org/html/2606.11640#S1.T1)\. Traditional methods incur substantial computational overhead when trained on large\-scale unlabeled or synthetic tabular data\(Hollmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib35)\)\. On the other hand, LLM\-based methods are constrained by the context length of LLMs\(Wanget al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib77); Anet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib78)\), and sending raw data to external models raises the concern about privacy and compliance\(Carliniet al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib75); Kimet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib76)\)\. More importantly, both paradigms largely overlook the semantic relationships between features as shown in Fig\.[1](https://arxiv.org/html/2606.11640#S0.F1)\(a\), which provide structural and semantic prior for constructing asemantic graph, like Fig\.[1](https://arxiv.org/html/2606.11640#S0.F1)\(b\)\. This semantic graph enables the modeling of meaningful feature interactions, addressing the instability and susceptibility to spurious correlations that arise when feature interactions are learned directly from sparse supervision in few\-shot scenarios\.

However, to the best of our knowledge, no prior work has successfully integrated semantic graphs into few\-shot tabular data learning, primarily due to two key challenges: ❶Difficulty in obtaining graph structures\.The semantic relationships between features of tabular data are often not explicitly provided, which makes the corresponding semantic graph structure unavailable\.\(Guoet al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib68)\)\. Meanwhile, existing graph structure learning methods\(Liao and Li,[2023](https://arxiv.org/html/2606.11640#bib.bib69); Yanet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib18)\)typically require a large amount of data to accurately infer semantic relationships between features, which limits their effectiveness in few\-shot scenarios\. ❷Structure noise in the semantic graph\.This noise arises from spurious or task\-unrelated edges and missing task\-related ones\. In few\-shot scenarios, such noise can misguide message passing and amplify irrelevant correlations, leading to unreliable predictions\(Daiet al\.,[2022](https://arxiv.org/html/2606.11640#bib.bib81)\)\.

In this paper, we propose TAROT, aTask\-AdaptiveRefinementof LLM\-prior Graphs for Few\-shotTabular Learning\. TAROT is a GNN\-based framework that improves prediction by explicitly modeling feature interactions through task\-adaptive semantic graph construction and refinement\. Our key innovation lies in constructing and refining a task\-adaptive semantic graph from the structural and semantic prior under limited data\. The semantic graph \(i\) emphasizes meaningful feature interactions\. and \(ii\) mitigates the adverse effects of structural noise\. This enables more reliable message passing and improves predictive performance\. Specifically, TAROT first introduces a Unified Semantic Tabular Node Encoder \(USTNE\) that encodes heterogeneous tabular features into unified node semantic representations using a pre\-trained encoder\. Next, it prompts LLMs to infer semantic relations between features based on the task objective description and feature names to an initial semantic graph \(Challenge ❶\)\. Then, we refine this graph in a task\-adaptive manner by pruning spurious and task\-unrelated edges and adding missing task\-related semantic ones, thereby denoising the LLM\-induced structural noise \(Challenge ❷\) and generating a task\-adaptive semantic graph\. Finally, we apply a GNN over the refined graph to model feature interactions, thereby capturing semantic dependencies that benefit downstream prediction\. Our maincontributionsare summarized as follows:

- ⋆\\starWe propose a novel insight that leverages LLMs to provide structural prior knowledge by inducing a semantic graph based on the task description and feature names, addressing the challenge that requires a large amount of data to accurately infer semantic relationships between features\.
- ⋆\\starWe introduce a task\-adaptive refinement mechanism that denoises semantic graphs by removing task\-unrelated edges and adding missing task\-related ones, enabling effective GNN\-based feature interaction modeling for few\-shot tabular learning\.
- ⋆\\starExtensive experiments on 11 real\-world datasets show that TAROT consistently outperforms state\-of\-the\-art baselines, while quantitative and qualitative analyses confirm the effectiveness of the generated task\-adaptive semantic graphs for few\-shot tabular learning\.

![Refer to caption](https://arxiv.org/html/2606.11640v1/x2.png)Figure 2\.Overview of TAROT\. USTNE encodes heterogeneous tabular data into unified node representations\. Then, LLMs construct a semantic graph based on task descriptions and feature names, which is refined via Task\-Adaptive Semantic Graph Refinement to reduce noise introduced by LLMs\. A GNN finally encodes the refined graph for prediction\.
## 2\.Related Work

### 2\.1\.Few\-shot Tabular Learning

Few\-shot tabular learning has been proposed to address scenarios where annotations are costly, and data for emerging tasks \(e\.g\., some rare or new diseases\) are scarce\(Namet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib23)\)\. Recent advances in this area can be broadly categorized into two categories:Traditional few\-shot tabular learning methodsandLLM\-based few\-shot tabular learning methods\. Traditional approaches, such as SCARF\(Bahriet al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib22)\), STUNT\(Namet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib23)\), and TabPFN\(Hollmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib35)\), leverage large\-scale unlabeled data or synthetic datasets for additional training, thereby capturing transferable tabular patterns that improve downstream performance under limited supervision\. In parallel, LLMs, trained on massive real\-world corpora, encode substantial world knowledge\(Huet al\.,[2025](https://arxiv.org/html/2606.11640#bib.bib86); Houet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib87)\)and exhibit strong reasoning capabilities\(Labanet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib84); Wanet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib85)\), making them increasingly attractive for tabular learning\. Existing LLM\-based approaches typically serialize tabular data into natural language representations and prompt LLMs to perform tabular prediction tasks\. For example, TABLET\(Slack and Singh,[2023](https://arxiv.org/html/2606.11640#bib.bib24)\)enhances tabular reasoning by incorporating task\-specific instructions into prompts, while FeatLLM\(Hanet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib26)\)employs LLMs as feature engineers to automatically filter important features before training downstream classifiers\. Alternatively, TabLLM\(Hegselmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib25)\)fine\-tunes LLMs on tabular data to improve their capability in understanding and processing tabular data\. Despite their effectiveness, most existing methods overlook semantic graph structures, whose message passing mechanisms capture semantic dependencies and improve few\-shot predictions\.

### 2\.2\.Graph\-based Tabular Learning

Tabular data often exhibit a semantic graph structure, i\.e\., semantic relationships among features\. Such semantic structures model meaningful feature interactions and improve predictive performance\.\(Yanet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib18)\)\. One straightforward way to obtain a tabular graph is manual feature engineering\(Seideet al\.,[2011](https://arxiv.org/html/2606.11640#bib.bib17)\)\. For example, TabGNN\(Guoet al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib68)\)constructs a multigraph using multiple hand\-designed relational rules and performs message passing to enhance tabular prediction\. However, manual graph construction typically requires domain expertise and incurs high human cost\. To reduce this burden, recent work explores semantic graph structure learning, enabling models to autonomously infer semantic relations among features\. For instance, TabGSL\(Liao and Li,[2023](https://arxiv.org/html/2606.11640#bib.bib69)\)explicitly initializes the learner\-view graph with kNN and preserves top\-kkconnections for graph construction\. In addition, T2G\-FORMER\(Yanet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib18)\)derives feature relationships by computing similarity scores, converting them into graph structures, and modeling them with graph transformers to capture semantic dependencies\. While these approaches can learn effective semantic graphs given sufficient training data, they often struggle in few\-shot scenarios where supervision is limited, making reliable semantic graph structure construction challenging\. To address this, we propose TAROT, which prompts LLMs to infer semantic relations among features based on the task objective and feature names, forming an initial semantic graph\. We then refine this graph in a task\-adaptive manner by pruning spurious edges and adding missing task\-relevant ones, resulting in a meaningful task\-related semantic graph\.

## 3\.Preliminaries

### 3\.1\.Problem Definition

Given a tabular datasetD=\{𝕏,𝕐\}D=\\\{\\mathbb\{X\},\\mathbb\{Y\}\\\}, each tabular samplex∈𝕏x\\in\\mathbb\{X\}is represented as tabular feature\{xi\}i=1n\\\{x\_\{i\}\\\}\_\{i=1\}^\{n\}, wherennis the number of tabular features that compose ofNnN\_\{n\}numerical features andNcN\_\{c\}categorical features\. The corresponding column name of each feature isfi∈Ff\_\{i\}\\in F\. The goal is to train a modelgθ:𝕏→𝕐g\_\{\\theta\}:\\mathbb\{X\}\\to\\mathbb\{Y\}, parameterized byθ\\theta, to map the row feature space𝕏\\mathbb\{X\}to the label space𝕐\\mathbb\{Y\}, where for binary classification𝕐=\{0,1\}\\mathbb\{Y\}=\\\{0,1\\\}, for multiclass classification𝕐=\{1,…,c\}\\mathbb\{Y\}=\\\{1,\\ldots,c\\\}, and for regression𝕐=ℝ\\mathbb\{Y\}=\\mathbb\{R\}\. We study thekk\-shot tabular learning by restricting supervision tokklabeled samples during training\.

### 3\.2\.Semantic Graph of Tabular Data

###### Definition 3\.1 \(Semantic Graph of Tabular Data\)\.

Tabular data exhibits an implicit semantic graph𝒢=\(𝒱,ℰ,A,H\)\\mathcal\{G\}=\(\\mathcal\{V\},\\mathcal\{E\},\\textbf\{A\},\\textbf\{H\}\), where each nodevi∈𝒱v\_\{i\}\\in\\mathcal\{V\}corresponds to atabular featurexix\_\{i\}, and an undirected edgeei​j=\{vi,vj\}∈ℰe\_\{ij\}=\\\{v\_\{i\},v\_\{j\}\\\}\\in\\mathcal\{E\}indicates that there existsemantic relationshipsbetween tabular featuresxix\_\{i\}andxjx\_\{j\}, where the semantics are primarily provided by the correspondingcolumn namesfif\_\{i\}andfjf\_\{j\}\.A∈\{0,1\}n×n\\textbf\{A\}\\in\\\{0,1\\\}^\{n\\times n\}is the adjacency matrix, whereAi​j=1\\textbf\{A\}\_\{ij\}=1if\{vi,vj\}∈ℰ\\\{v\_\{i\},v\_\{j\}\\\}\\in\\mathcal\{E\}, andAi​j=0\\textbf\{A\}\_\{ij\}=0otherwise, andH∈ℝn×d\\textbf\{H\}\\in\\mathbb\{R\}^\{n\\times d\}contains add\-dimensional embedding for each node\.

The semantic graph of tabular data facilitates the modeling of meaningful feature interactions, alleviating the instability and susceptibility to spurious correlations that arise when feature interactions are learned directly from sparse supervision in few\-shot scenarios\. However, in the real\-world tabular dataset, semantic relationships between features of tabular data are often not explicitly provided\. It makes the semantic graph structureℰ\\mathcal\{E\}and corresponding adjacency matrixAunavailable\.

## 4\.Proposed Method: TAROT

In this section, we introduce the overall framework of TAROT, as illustrated in Fig\.[2](https://arxiv.org/html/2606.11640#S1.F2)\. TAROT constructs a task\-adaptive semantic graph to facilitate task\-aware message passing across related features, thereby stabilizing representation learning and improving predictive performance in few\-shot tabular scenarios\. The framework consists of four main components: \(i\) Unified Semantic Tabular Node Encoder \(USTNE\) \(Sec\.[4\.1](https://arxiv.org/html/2606.11640#S4.SS1)\)\. \(ii\) Semantic Graph Structure Construction, \(Sec\.[4\.2](https://arxiv.org/html/2606.11640#S4.SS2)\)\. \(iii\) Task\-Adaptive Semantic Graph Refinement \(Sec\.[4\.3](https://arxiv.org/html/2606.11640#S4.SS3)\)\. \(iv\) Message Passing Mechanism \(Sec\.[4\.4](https://arxiv.org/html/2606.11640#S4.SS4)\)\.

### 4\.1\.Unified Semantic Tabular Node Encoder

Tabular data typically comprises heterogeneous features, which poses a major challenge for existing embedding methods, especially in few\-shot scenarios, where models are more prone to overfitting\. To tackle this issue, we propose USTNE, a training\-free unified semantic tabular data encoder that transforms the heterogeneous tabular samplex=\{xi\}i=1nx=\\\{x\_\{i\}\\\}\_\{i=1\}^\{n\}into a unified semantic node representation matrixH=\{hi\}i=1n\\textbf\{H\}=\\\{h\_\{i\}\\\}\_\{i=1\}^\{n\}\. We first decompose the tabular data into a text setTextand a numerical setNumas follows:

\(1\)Text,Num=Extract​\(D,F\),\\text\{\{Text\}\},\\ \\text\{\{Num\}\}=\\mathrm\{Extract\}\\left\(D,F\\right\),where, the functionExtract​\(⋅\)\\mathrm\{Extract\}\(\\cdot\)collects all elements from the tabular dataDDand feature namesFF, and then assigns them to the textual subsetTextand the numerical subsetNum, based on their data types\. For example, as shown in Fig\.[2](https://arxiv.org/html/2606.11640#S1.F2)\(a\), categorical types such as Age, Education, and HS\-grad are assigned toText, whereas numerical types \(e\.g\., 58 and 33\) are assigned toNum\. To ensure stable optimization, numerical features are normalized to mitigate scale discrepancies before being fed into the model\. Finally, since existing pre\-trained encoders \(e\.g\., BERT\(Devlinet al\.,[2019](https://arxiv.org/html/2606.11640#bib.bib54)\)\) have relatively weak encoding capabilities for numerical data, we only use a pre\-trained text encoder to encode the textual subsetText:

\(2\)T=Text​\_​Encoder​\(Text\),\\text\{\{T\}\}=\\mathrm\{Text\\\_Encoder\}\(\\text\{\{Text\}\}\),whereT∈ℝnt×d\\text\{\{T\}\}\\in\\mathbb\{R\}^\{n\_\{t\}\\times d\}is the set of text embedding,ntn\_\{t\}is the number of text element in tabular dataDD\. Finally, for categorical features, both representation of the feature name and feature value are retrieved fromT\. For numerical features, representation of the feature name is retrieved fromT, while the numerical value is retrieved fromNum\. The following procedure is then applied to obtain node representations that incorporate semantic information in the form of key value pair \(fjf\_\{j\},xix\_\{i\}\):

\(3\)hi=\{Tfi⊕Txi,for categorical feature,Tfi⊙Numxi,for numerical feature,h\_\{i\}=\\begin\{cases\}\\text\{\{T\}\}\_\{f\_\{i\}\}\\oplus\\text\{\{T\}\}\_\{x\_\{i\}\},&\\text\{for categorical feature\},\\\\\[3\.0pt\] \\text\{\{T\}\}\_\{f\_\{i\}\}\\odot\\text\{\{Num\}\}\_\{x\_\{i\}\},&\\text\{for numerical feature\},\\end\{cases\}wherehi∈ℝdh\_\{i\}\\in\\mathbb\{R\}^\{d\}denotes the node embedding of theii\-th feature,⊕\\oplusrepresents the concatenation operation, and⊙\\odotrepresents element\-wise multiplication between the embedding of the feature name and the corresponding value\. Through the aforementioned encoding process, we have transformed the heterogeneous tabular samplex=\{xi\}i=1nx=\\\{x\_\{i\}\\\}\_\{i=1\}^\{n\}into a unified semantic nodes representationH=\{hi\}i=1n\\textbf\{H\}=\\\{h\_\{i\}\\\}\_\{i=1\}^\{n\}in the semantic graph\.

### 4\.2\.Semantic Graph Structure Construction

![Refer to caption](https://arxiv.org/html/2606.11640v1/x3.png)Figure 3\.Comparison of the impact of graph structures generated by different methods on task performance at 16\-shot setting\. For a fair comparison, the three methods differ only in their graph structures\.To address Challenge ❶ \(i\.e\., the difficulty of constructing semantic graphs for tabular data in few\-shot scenarios\), we leverage LLMs as an external knowledge source\. Benefiting from real\-world knowledge acquired during pretraining and strong zero\-shot reasoning ability, LLMs can capture semantic relationships between features and thus facilitate semantic graph construction\. Our objective is to use LLMs to generate a semantic graph structureℰ\\mathcal\{E\}based on the task description and feature names\. Specifically, we construct a promptppfor the LLM to guide it to understand the structure of tabular data and infer semantic relationships between features by leveraging its diverse, aggregated knowledge\. To better exploit the LLM’s reasoning ability and to standardize its output, the promptpp\(please see Appendix[A](https://arxiv.org/html/2606.11640#A1)Tab\.[5](https://arxiv.org/html/2606.11640#A1.T5)\) consists of three components: \(i\)pm​e​t​ap\_\{meta\}\(ii\)pIp\_\{I\}\(iii\)pc​o​d​ep\_\{code\}, where, Meta Informationpm​e​t​ap\_\{meta\}describes the task objective and the feature names; InstructionpIp\_\{I\}constrains the model to infer semantic relationships between features in a step\-by\-step \(i\.e\. CoT\) manner, while standardizing LLM by filling in the parts related to semantic relationship between features in thepc​o​d​ep\_\{code\}\. Instead of first getting the edge setℰ\\mathcal\{E\}and then converting it into the adjacency matrixA, we prompt the LLM to generate an*executable*code snippet that directly outputs the adjacency matrixAby filling the code template in Fig\.[2](https://arxiv.org/html/2606.11640#S1.F2)\(b\)\. This design avoids additional parsing and post\-processing, leading to a more deterministic and implementation\-friendly graph structure construction:

\(4\)A=Execute​\(LLM​\(p\)\)=Execute​\(LLM​\(pm​e​t​a⊕pI⊕pc​o​d​e\)\),\\begin\{split\}\\textbf\{A\}&=\\text\{Execute\}\(\\text\{LLM\}\(p\)\)\\\\ &=\\text\{Execute\}\(\\text\{LLM\}\(p\_\{meta\}\\oplus p\_\{I\}\\oplus p\_\{code\}\)\),\\end\{split\}where⊕\\oplusis the concatenation of each prompt, and this process is achieved through querying the LLM, without requiring any fine\-tuning\. To validate the effectiveness of the LLM\-generated semantic graph structure, we compare it against \(i\) randomly generated graphs and \(ii\) the neural baseline T2G\-FORMER\(Yanet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib18)\)\. As shown in Fig\.[3](https://arxiv.org/html/2606.11640#S4.F3), the LLM\-generated graph yields significant improvements on downstream tasks under the 16\-shot setting\. Notably, the LLM requires only basic meta information \(e\.g\., feature names and task description\) and does not access any data samples\. Therefore, graph construction is performed in a zero\-shot manner, which mitigates potential privacy risks associated with sending raw data to the LLM\. Despite the LLM\-generated semantic graph structures yielding significant gains on downstream tasks, we observe an issue \(Challenge ❷\) illustrated in Fig\.[2](https://arxiv.org/html/2606.11640#S1.F2)\(c\): Because of hallucination problems\(Duanet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib70); Perkovićet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib71); Liet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib72)\), LLMs may introduce spurious edges that are irrelevant to the downstream task and may also miss task\-related semantic ones\. For example, for a task that predicts whether an individual’s income exceeds $50,000, LLMs might incorrectly connect age and race, edges that are not useful for the task, while omitting necessary connections among truly semantic informative features\. In few\-shot scenarios, such structure noise can misguide message passing and amplify irrelevant correlations, leading to unreliable predictions\. To address this, we treat the LLM\-generated semantic graph structureAas a prior graph structureAp​r​i​o​r\\textbf\{A\}\_\{prior\}and further refine it via Task\-Adaptive Semantic Graph Refinement\.

### 4\.3\.Task\-Adaptive Semantic Graph Refinement

To address Challenge ❷ \(i\.e\., structural noise introduced by LLM\), we propose a task\-adaptive framework that refines the graph structureAp​r​i​o​r\\textbf\{A\}\_\{prior\}using only a small set of labeled samples via task\-adaptive graph optimization\. Specifically, we prune spurious and task\-unrelated edges and add missing task\-related semantic ones to denoise the LLM\-induced structural noise and obtain a structure that is consistent with the downstream task\. In the few\-shot scenarios, learning a graph solely from limited supervision is often unstable and yields suboptimal structures\. By anchoring optimization aroundAp​r​i​o​r\\textbf\{A\}\_\{prior\}, our method avoids an expensive search over the entire structure space and instead performs localized, semantics\-preserving adjustments to achieve task\-adaptive refinement\. We first construct edge representation for each node pair \(v,uv,u\) by combining their node representations, as follows:

\(5\)ev,u=\[\(hv,hu\)⊕\(hv⊙hu\)⊕\(\|hv−hu\|\)\],e\_\{v,u\}=\\big\[\(h\_\{v\},h\_\{u\}\)\\oplus\(h\_\{v\}\\odot h\_\{u\}\)\\oplus\(\|h\_\{v\}\-h\_\{u\}\|\)\\big\],where⊕\\oplusrepresents the concatenation operation, and⊙\\odotrepresents element\-wise multiplication\. We construct reliable semantic edge representationev,u∈ℝ3​de\_\{v,u\}\\in\\mathbb\{R\}^\{3d\}by concatenating\(hv,hu\)\(h\_\{v\},h\_\{u\}\), their element\-wise product\(hv⊙hu\)\(h\_\{v\}\\odot h\_\{u\}\), and absolute difference\(\|hv−hu\|\)\(\|h\_\{v\}\-h\_\{u\}\|\)\. Next, we use a linear model to score the edge representationsev,ue\_\{v,u\}as follows:

\(6\)sv,u=σ​\(ev,u​w\),s\_\{v,u\}=\\sigma\(e\_\{v,u\}w\),wheresv,us\_\{v,u\}is the semantic score of the edge\(v,u\)\(v,u\),σ\\sigmais the Sigmoid function, andw∈ℝ3​d×1w\\in\\mathbb\{R\}^\{3d\\times 1\}represents the trainable weight in the linear model\. To learn structural information from the prior semantic graphAp​r​i​o​r\\textbf\{A\}\_\{prior\}, we impose a prior learning loss, as follows:

\(7\)ℒp​r​i​o​r=1\|Ω\|​∑\(v,u\)∈Ωlog⁡\(1\+exp⁡\(−sv,u\)\),\\mathcal\{L\}\_\{prior\}=\\frac\{1\}\{\|\\Omega\|\}\\sum\_\{\(v,u\)\\in\\Omega\}\\log\\\!\\left\(1\+\\exp\(\-s\_\{v,u\}\)\\right\),\(8\)Ω=\{\(v,u\)∣Aprior​\(v,u\)=1\}\.\\Omega=\\\{\(v,u\)\\mid A\_\{\\text\{prior\}\}\(v,u\)=1\\\}\.
Overall, this loss encourages the model to regularize toward the prior graph structureAp​r​i​o​r\\textbf\{A\}\_\{prior\}and inject the structural knowledge encoded inAp​r​i​o​r\\textbf\{A\}\_\{prior\}into the learning process, thereby mitigating overfitting caused by scarce labeled data in few\-shot scenarios\. Then, based on the semantic scores, we derive the set of task\-unrelated edges to prune and the set of task\-related ones to add as follows:

\(9\)Ap​r​u​n​e​\(v,u\)=\{0,sv,u≥τ,1,sv,u<τ,\\textbf\{A\}\_\{prune\}\(v,u\)=\\begin\{cases\}0,&s\_\{v,u\}\\geq\\tau,\\\\\[3\.0pt\] 1,&s\_\{v,u\}<\\tau,\\end\{cases\}
\(10\)Ae​n​h​a​n​c​e​\(v,u\)=\{0,sv,u∉Topk​\(\{sv,u\}1≤v,u≤n\),1,sv,u∈Topk​\(\{sv,u\}1≤v,u≤n\),\\textbf\{A\}\_\{enhance\}\(v,u\)=\\begin\{cases\}0,&s\_\{v,u\}\\notin\\text\{Topk\}\(\\\{s\_\{v,u\}\\\}\_\{\\begin\{subarray\}\{c\}1\\leq v,u\\leq n\\end\{subarray\}\}\),\\\\\[3\.0pt\] 1,&s\_\{v,u\}\\in\\text\{Topk\}\(\\\{s\_\{v,u\}\\\}\_\{\\begin\{subarray\}\{c\}1\\leq v,u\\leq n\\end\{subarray\}\}\),\\end\{cases\}whereAp​r​u​n​e∈ℝn×n\\textbf\{A\}\_\{prune\}\\in\\mathbb\{R\}^\{n\\times n\}denotes a binary mask of task\-unrelated edges to be removed, obtained by marking edges whose semantic scores fall below a thresholdτ\\tau\.Ae​n​h​a​n​c​e∈ℝn×n\\textbf\{A\}\_\{enhance\}\\in\\mathbb\{R\}^\{n\\times n\}denotes a binary mask of task\-related edges to be added, obtained by selecting the topk edges with the highest semantic scores\. Based on these two masks, we refine the prior graphAp​r​i​o​r\\textbf\{A\}\_\{prior\}by first pruning noisy connections and then adding missing relations:

\(11\)Ar​e​f​i​n​e=\(Ap​r​i​o​r⊙\(1−Ap​r​u​n​e\)\)∨Ae​n​h​a​n​c​e,\\textbf\{A\}\_\{refine\}=\(\\textbf\{A\}\_\{prior\}\\odot\(1\-\\textbf\{A\}\_\{prune\}\)\)\\lor\\textbf\{A\}\_\{enhance\},where,Ar​e​f​i​n​e\\textbf\{A\}\_\{refine\}represents task\-adaptive semantic graph structure,⊙\\odotdenotes element\-wise multiplication and∨\\veedenotes edge\-wise union\.

Table 2\.Evaluation results, including the AUC \(↑\\uparrow\) scores across eight classification datasets\. The best performances are highlighted in bold, and the second\-best are underlined\. Metric values are averaged over three random seeds\. Out of In\-context Window \(OOW\) indicates that the input length exceeds the maximum context window limit of the model and cannot be processed\.DataShotTraditional Few\-shot Tabular LearningLLM\-based Few\-shot Tabular LearningTAROT\(Ours\)SCARFTabPFNSTUNTIn\-contextTABLETTabLLMFeatLLMAdult458\.3415\.4258\.34\_\{15\.42\}60\.8923\.2860\.89\_\{23\.28\}67\.4329\.6167\.43\_\{29\.61\}77\.515\.2477\.51\_\{5\.24\}75\.2912\.2475\.29\_\{12\.24\}83\.572\.6983\.57\_\{2\.69\}86\.680\.8686\.791\.78872\.428\.9572\.42\_\{8\.95\}70\.429\.9670\.42\_\{9\.96\}82\.166\.9382\.16\_\{6\.93\}79\.302\.8979\.30\_\{2\.89\}77\.567\.5677\.56\_\{7\.56\}83\.524\.3083\.52\_\{4\.30\}87\.890\.0687\.901\.651675\.639\.5675\.63\_\{9\.56\}70\.349\.9670\.34\_\{9\.96\}80\.5710\.9380\.57\_\{10\.93\}79\.504\.5779\.50\_\{4\.57\}79\.745\.6479\.74\_\{5\.64\}83\.232\.4583\.23\_\{2\.45\}87\.540\.5088\.010\.98Amazon447\.714\.0453\.863\.3853\.637\.9948\.632\.6846\.711\.5450\.768\.7148\.876\.9354\.385\.17847\.714\.4054\.281\.7654\.093\.1948\.853\.3545\.092\.0348\.531\.0250\.286\.2954\.563\.951647\.794\.0156\.584\.9752\.120\.8748\.244\.1849\.836\.9451\.624\.1651\.336\.0157\.225\.84Blood456\.2221\.0058\.7219\.1648\.576\.0456\.3012\.4356\.4515\.4555\.8713\.4968\.347\.4873\.991\.79865\.775\.0066\.3010\.0160\.004\.8458\.9910\.1256\.3711\.5666\.019\.2570\.373\.2374\.064\.451666\.275\.0464\.146\.8054\.764\.5356\.595\.2160\.624\.1365\.147\.5570\.075\.1975\.260\.76Credit\-g448\.924\.6054\.007\.3448\.806\.7652\.994\.0854\.336\.5451\.909\.4055\.941\.1060\.775\.01855\.263\.9252\.5811\.2754\.508\.2552\.434\.3652\.905\.7956\.4212\.8957\.423\.1062\.940\.741659\.2211\.3858\.918\.0457\.637\.5855\.294\.8051\.654\.0260\.3814\.0356\.602\.2269\.371\.85Diabetes462\.357\.4856\.2813\.0164\.226\.7871\.715\.3163\.963\.3270\.423\.6980\.280\.7581\.284\.05864\.6913\.3369\.089\.6867\.3912\.9272\.212\.0765\.473\.9564\.305\.8879\.381\.6681\.531\.511671\.863\.1673\.693\.2173\.796\.4871\.645\.0566\.710\.7667\.342\.7980\.151\.3579\.630\.36Heart459\.383\.4267\.3315\.2988\.273\.3260\.764\.0068\.1911\.1759\.744\.4975\.664\.5989\.120\.07874\.356\.9377\.892\.3488\.782\.3865\.463\.7769\.8510\.8270\.147\.9179\.462\.1690\.651\.791683\.665\.9181\.455\.0589\.132\.1067\.007\.8368\.3911\.7381\.723\.9283\.711\.8892\.290\.74Communities466\.189\.13OOW66\.8714\.10OOWOOWOOW75\.395\.0575\.903\.62872\.693\.79OOW76\.364\.55OOWOOWOOW76\.591\.2577\.121\.241673\.092\.84OOW77\.292\.56OOWOOWOOW76\.250\.6478\.512\.57Myocardial447\.704\.10OOW52\.772\.01OOWOOWOOW52\.873\.4457\.461\.88849\.373\.41OOW55\.404\.41OOWOOWOOW56\.221\.6463\.160\.871654\.311\.42OOW61\.223\.45OOWOOWOOW55\.329\.1563\.223\.13

### 4\.4\.Message Passing Mechanism

Through USTNE, we obtain a unified semantic node representations matrixH\. Through Semantic Graph Structure Construction and Task\-Adaptive Semantic Graph Refinement, we derive a task\-adaptive semantic adjacency matrixAr​e​f​i​n​e\\textbf\{A\}\_\{refine\}, forming a semantic graph𝒢=\(Ar​e​f​i​n​e,H\)\\mathcal\{G\}=\(\\textbf\{A\}\_\{refine\},\\textbf\{H\}\)\. To model feature interactions, we perform message passing on𝒢\\mathcal\{G\}, where the refined neighborhood of nodevvis defined as𝒩r​e​f​i​n​e​\(v\)=\{u∣Ar​e​f​i​n​e​\(v,u\)=1\}\.\\mathcal\{N\}\_\{refine\}\(v\)=\\\{u\\mid\\textbf\{A\}\_\{refine\}\(v,u\)=1\\\}\.Accordingly, the GNN layer is formulated as follows:

\(12\)hvl\+1=σ​\(W⋅AGG⁡\(hul:u∈𝒩r​e​f​i​n​e​\(v\)∪v\)\),h\_\{v\}^\{l\+1\}=\\sigma\\Big\(\\textbf\{W\}\\cdot\\operatorname\{AGG\}\(\{h\_\{u\}^\{l\}:u\\in\\mathcal\{N\}\_\{refine\}\(v\)\\cup\{v\}\}\)\\Big\),wherehvl∈ℝdh\_\{v\}^\{l\}\\in\\mathbb\{R\}^\{d\}denotes the node representations at thell\-th layer,σ\\sigmais the ReLU activation function,W∈ℝd×d\\textbf\{W\}\\in\\mathbb\{R\}^\{d\\times d\}is the linear transformation matrix, and AGG is neighborhood aggregation method, we use mean as aggregation method\. Finally, we use a linear layer to predict the final layer outputH′=\{h1′,⋯,hn′\}\\textbf\{H\}^\{\\prime\}=\\\{h\_\{1\}^\{\\prime\},\\cdots,h\_\{n\}^\{\\prime\}\\\}:

\(13\)y^=Linear​\(Mean​\(H′\)\),\\hat\{y\}=\\text\{Linear\}\(\\text\{Mean\}\(\\textbf\{H\}^\{\\prime\}\)\),whereMean​\(H′\)\\text\{Mean\}\(\\textbf\{H\}^\{\\prime\}\)aggregates all node representations to produce the representation of the task, which is then passed through a linear layer for prediction\. Finally, the parameters and graph structure are optimized using the following loss function:

\(14\)ℒ=ℒt​a​s​k\+λ1​ℒp​r​i​o​r⏟E​q\.[7](https://arxiv.org/html/2606.11640#S4.E7)\+λ2​ℒs​p​a​r​s​e⏟E​q\.[15](https://arxiv.org/html/2606.11640#S4.E15),\\mathcal\{L\}=\\mathcal\{L\}\_\{task\}\+\\lambda\_\{1\}\\underbrace\{\\mathcal\{L\}\_\{prior\}\}\_\{Eq\.~\\ref\{eq:7\}\}\+\\lambda\_\{2\}\\underbrace\{\\mathcal\{L\}\_\{sparse\}\}\_\{Eq\.~\\ref\{eq:15\}\},where,λ1\\lambda\_\{1\}andλ2\\lambda\_\{2\}are weighting hyperparameters that control the relative contributions ofℒp​r​i​o​r\\mathcal\{L\}\_\{prior\}andℒs​p​a​r​s​e\\mathcal\{L\}\_\{sparse\}\(w\.r\.t\.ℒt​a​s​k\\mathcal\{L\}\_\{task\}\), thereby balancing task fitting, prior learning, and sparsity regularization,ℒt​a​s​k\\mathcal\{L\}\_\{task\}denotes the task loss, which depends on the downstream setting\. For classification, we use the cross\-entropy lossℒt​a​s​k=−∑cyc​log⁡y^c\\mathcal\{L\}\_\{task\}=\-\\sum\_\{c\}y\_\{c\}\\log\\hat\{y\}\_\{c\}, and for regression, we use the squared errorℒt​a​s​k=‖y−y^‖22\\mathcal\{L\}\_\{task\}=\\\|y\-\\hat\{y\}\\\|\_\{2\}^\{2\}\.ℒs​p​a​r​s​e\\mathcal\{L\}\_\{sparse\}is a sparsity regularizer to encourage a sparse refined structure as follows:

\(15\)ℒs​p​a​r​s​e=∑\(v,u\):Ar​e​f​i​n​e​\(v,u\)=1sv,u,\\mathcal\{L\}\_\{sparse\}=\\sum\_\{\(v,u\):\\,\\textbf\{A\}\_\{refine\}\(v,u\)=1\}s\_\{v,u\},wheresv,us\_\{v,u\}is the semantic score of the edge\(v,u\)\(v,u\)\. Refer to Appendix[A](https://arxiv.org/html/2606.11640#A1)for the detailed algorithm\.

## 5\.Experiments

Datasets\.We evaluate TAROT on eleven real\-world datasets, including eight for classification tasks: Adult\(Asuncionet al\.,[2007](https://arxiv.org/html/2606.11640#bib.bib50)\), Amazon\(Raniet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib48)\), Blood\(Yehet al\.,[2009](https://arxiv.org/html/2606.11640#bib.bib49)\), Credit\-g\(Kadraet al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib51)\), Diabetes111[https://www\.kaggle\.com/datasets/uciml/pima\-indians\-diabetes\-database](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database), Heart222[https://www\.kaggle\.com/datasets/fedesoriano/heart\-failure\-prediction](https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction), Communities\(Asuncionet al\.,[2007](https://arxiv.org/html/2606.11640#bib.bib50)\), and Myocardial\(Golovenkinet al\.,[2020](https://arxiv.org/html/2606.11640#bib.bib52)\), and three for regression tasks from OpenML\(Vanschorenet al\.,[2014](https://arxiv.org/html/2606.11640#bib.bib64)\), i\.e\., Abalone, Boston, and Cholesterol\. Dataset statistics details are provided in Appendix[B\.1](https://arxiv.org/html/2606.11640#A2.SS1)\. Baselines\.We compare TAROT against a broad set of baselines for few\-shot tabular learning\. These include Traditional few\-shot tabular learning methods: SCARF\(Bahriet al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib22)\), TabPFN\(Hollmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib35)\), and STUNT\(Namet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib23)\), and LLM\-based few\-shot methods: In\-context\(Weiet al\.,[2022](https://arxiv.org/html/2606.11640#bib.bib36)\), TABLET\(Slack and Singh,[2023](https://arxiv.org/html/2606.11640#bib.bib24)\), TabLLM\(Hegselmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib25)\), and FeatLLM\(Hanet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib26)\)\. In addition, we evaluate different graph construction strategies: TabGSL\(Liao and Li,[2023](https://arxiv.org/html/2606.11640#bib.bib69)\)and T2G\-FORMER\(Yanet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib18)\), to assess the effectiveness of the graph structure constructed by TAROT\. Implementation Details\. We use the BERT\(Devlinet al\.,[2019](https://arxiv.org/html/2606.11640#bib.bib54)\)model’s embedding layer as the text encoder and GPT\-4o\-mini\(Achiamet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib55)\)for generating graph structures\. Topk is set to 5,τ\\tauis set to 0\.2,λ1\\lambda\_\{1\}andλ2\\lambda\_\{2\}are set to 0\.1 for task\-adaptive semantic graph refinement\. In the API, we set the temperature for inference by the LLM to 0\.0 and kept the top\-ppvalue at its default setting of 1\. For training, we utilize the Adam optimizer with a learning rate of 1e\-5, and the model is trained for 1000 iterations\. For evaluation, we use the Area Under the Curve \(AUC\) metric for classification tasks and Root Mean Squared Error \(RMSE\) for regression tasks\.

### 5\.1\.Main Results

![Refer to caption](https://arxiv.org/html/2606.11640v1/x4.png)Figure 4\.Evaluation results, including the RMSE \(↓\\downarrow\) scores across three regression datasets\. TabPFN, STUNT, TABLET, TabLLM, and FeatLLM are specifically designed for few\-shot classification tasks and are not suitable for regression tasks\.Table 3\.Runtime in seconds of TAROT and other few\-shot tabular baseline methods for the training and inference phase, conducted on the Myocardial dataset\.Q: Does our method perform well in few\-shot tabular prediction?Yes, as shown in Tab\.[2](https://arxiv.org/html/2606.11640#S4.T2)and Fig\.[4](https://arxiv.org/html/2606.11640#S5.F4), our method achieves SOTA performance on both classification and regression tasks\. ⊳\\rhdClassification Task Comparison\.The comparison results are summarized in Tab\.[2](https://arxiv.org/html/2606.11640#S4.T2)\. From this, we can conclude that our method outperforms other few\-shot methods in few\-shot tabular classification tasks\. As shown in Tab\.[2](https://arxiv.org/html/2606.11640#S4.T2), our method achieves better results than the best few\-shot tabular classification baseline method FeatLLM and reaches state\-of\-the\-art performance on this task\. This result demonstrates that introducing a semantic graph structure inferred from LLM and combining it with task\-adaptive structural refinement can more effectively capture task\-related semantic dependencies between features, thereby improving classification performance in few\-shot scenarios\. ❶The LLM\-based method significantly outperforms Traditional models in few\-shot scenarios\.The results in Tab\.[2](https://arxiv.org/html/2606.11640#S4.T2)show that the LLM\-based method significantly outperforms Traditional models\. This is mainly due to the fact that LLM can utilize its real\-world knowledge learned during the pre\-training stage to perform stronger semantic understanding and reasoning on tabular data, while traditional methods usually rely only on limited labeled signals in few\-shot scenarios, making it difficult to obtain sufficient semantic inductive ability\. Unlike directly using LLM to reason about the entire table, TAROT only uses feature names to prompt LLM infer the semantic graph, then refines and aligns it with the task through a task adaptation mechanism, thereby injecting effective structural prior information in a lighter way\. ⊳\\rhdComparison in Regression Tasks\. We further evaluate TAROT on three few\-shot regression datasets\. As shown in Fig\.[4](https://arxiv.org/html/2606.11640#S5.F4), ❷TAROT not only stably applies to regression tasks but also achieves state\-of\-the\-art performance\.This demonstrates that the semantic graph structure constructed by TAROT can also provide effective structural priors for regression\. Compared to the existing few\-shot tabular learning baselines, our method achieves state\-of\-the\-art \(SOTA\) performance on few\-shot regression tasks\. Overall, these results validate that graph priors inferred by LLM can still capture semantic dependencies between features in regression settings, thereby improving prediction accuracy\.

Q: Does TAROT have a high computational efficiency?Yes, as shown in Tab\.[3](https://arxiv.org/html/2606.11640#S5.T3), TAROT have a high computational efficiency in both training and inference\. This efficiency arises from two key design choices: \(i\) Compared to Traditional few\-shot tabular learning methods that rely on additional pre\-training with unlabeled or synthetic data, TAROT requires no additional pre\-training phase, thus significantly reducing overall training overhead; \(ii\) Compared to LLM\-based few\-shot tabular methods, TAROT avoids performing LLM queries for each instance to complete predictions\. Instead, we obtain the graph structure inferred by LLM as a structural prior with just a single API call, which can be reused in subsequent training and inference, thereby utilizing LLM resources more efficiently and reducing inference costs\.

![Refer to caption](https://arxiv.org/html/2606.11640v1/x5.png)Figure 5\.Quantitative analysis of the effectiveness of generating graph structures on Myocardial\. w/o SGSC: removeSemanticGraphStructureConstruct and refine an empty prior graph via Task\-Adaptive Semantic Graph Refinement\. To fairly compare the five methods differ only in their graph structures\.
### 5\.2\.Effectiveness of Graph Structure

![Refer to caption](https://arxiv.org/html/2606.11640v1/x6.png)Figure 6\.Qualitative analysis of the effectiveness of generating graph structures on Adult, the task is “Predict whether the person earns more than 50000 dollars per year?”Q: Dose TAROT generated an effective graph structure?Yes, we analyze the effectiveness of the graph structure generated by TAROT from both quantitative and qualitative perspectives, as shown in Fig\.[5](https://arxiv.org/html/2606.11640#S5.F5)and[6](https://arxiv.org/html/2606.11640#S5.F6), respectively\. ⊳\\rhdQuantitative analysis\. The quantitative analysis of the effectiveness of various graph construction strategies shown in Fig\.[5](https://arxiv.org/html/2606.11640#S5.F5)in downstream tasks, we have two observations: ❸Previous graph construction strategies fail to capture semantic relationships between features in few\-shot scenarios\.As shown in Fig\.[5](https://arxiv.org/html/2606.11640#S5.F5), in the few\-shot scenarios \(e\.g\., 4\-64 shots\), the graph structures learned by the previous methods \(TabGST and T2G\-FORMER\) perform similarly to the randomly generated graphs in downstream tasks, indicating that these methods have difficulty learning effective graph structures in few\-shot settings\. In contrast, TAROT utilizes the knowledge from LLM to construct a semantic graph structure and further performs task\-adaptive semantic graph refinement, resulting in an effective graph structure that promotes the performance in downstream tasks\. ❹When Semantic Graph Structure Construction \(SGSC\) is removed \(i\.e\., w/o SGSC\), our method degenerates to conventional graph structure learning methods\.As shown in Fig\.[5](https://arxiv.org/html/2606.11640#S5.F5), removing SGSC causes performance in few\-shot settings to drop to the level of randomly generated graph structures, suggesting that the semantic graph produced by SGSC provides useful priors for constructing effective graph structures\.

⊳\\rhdQualitative analysis\. The qualitative analysis of the effectiveness of various graph construction strategies for downstream tasks, shown in Fig\.[6](https://arxiv.org/html/2606.11640#S5.F6), ❺in few\-shot scenarios, TAROT can effectively induce task\-related semantic graph structures\.Specifically, as illustrated in Fig\.[6](https://arxiv.org/html/2606.11640#S5.F6), TAROT first elicits a meaningful structure prior via a zero\-shot LLM prompt, and then suppresses spurious connections through Task\-Adaptive Semantic Graph Refinement\. For example, it removes the task\-unrelated edge between sex and race when predicting whether income exceeds$​50,000\\mathdollar 50\{,\}000\. In contrast, the graphs produced by TabGSL and T2G\-FORMER under the 4\-shot regime are close to random, containing substantial structural noise, which in turn hampers their ability to capture reliable feature dependencies and degrades downstream performance\.

### 5\.3\.Further Analysis

Table 4\.Effect of model components on Myocardial\. USTNE denotesUnifiedSemanticTabularNodeEncoder, CT denotesCodeTemplate, TAP denotesTask\-AdaptivePruning, TAE denotesTask\-AdaptiveEnhancement![Refer to caption](https://arxiv.org/html/2606.11640v1/x7.png)Figure 7\.Impact of varying the number of GNN layers on the performance of TAROT across different datasets\.Q: Does each component of TAROT contribute effectively to the overall performance?Yes, each component of TAROT contribute effectively to the overall performance\. ⊳\\rhdAblation Analysis\.As shown in Tab\.[4](https://arxiv.org/html/2606.11640#S5.T4), ❻the prior learning lossℒp​r​i​o​r\\mathcal\{L\}\_\{prior\}constrains Task\-Adaptive Semantic Graph Refinement\.Removingℒp​r​i​o​r\\mathcal\{L\}\_\{prior\}leads to a significant performance drop, indicating that the prior learning lossℒp​r​i​o​r\\mathcal\{L\}\_\{prior\}constrains the Task\-Adaptive Semantic Graph Refinement by referencing the structural prior in the semantic graph generated by the LLM, rather than arbitrarily adding or pruning semantic edges\. ⊳\\rhdAnalysis of the Relationship between GNN Layers and Node Size\.As shown in Fig\.[7](https://arxiv.org/html/2606.11640#S5.F7), ❼datasets with fewer nodes typically benefit from shallower GNNs, while larger graphs require deeper GNNs\.For small\-node datasets \(e\.g\., Heart with 11 nodes\), adding more GCN layers makes performance more sensitive to the number of nodes and can be hurt due to over\-smoothing and information loss\. In contrast, for large\-node datasets \(e\.g\., Myocardial with 111 nodes\), deeper GNNs better capture complex feature interactions and global structure, leading to improved performance\.

## 6\.Conclusion

In this paper, we introduced TAROT, a task\-adaptive GNN\-based framework for few\-shot tabular learning that explicitly models feature interactions through semantic graphs\. Motivated by the observation that existing few\-shot tabular learning paradigms largely overlook semantic relationships between features, TAROT leverages LLMs to induce a semantic graph based on task descriptions and feature names, providing a strong structural prior under limited supervision\. To address the inevitable structural noise introduced by LLM inference, we further proposed a task\-adaptive graph refinement mechanism that prunes spurious and task\-unrelated edges while recovering missing task\-related semantic ones\. Extensive experiments on 11 real\-world datasets demonstrate that TAROT consistently outperforms state\-of\-the\-art Traditional and LLM\-based baselines in few\-shot settings\.

## 7\.ACKNOWLEDGMENTS

This work was supported by a grant from the National Natural Science Foundation of China under grants \(No\.62372211\), and the Science and Technology Development Program of Jilin Province \(No\.20250102216JC\)\.

## References

- J\. Achiam, S\. Adler, S\. Agarwal, L\. Ahmad, I\. Akkaya, F\. L\. Aleman, D\. Almeida, J\. Altenschmidt, S\. Altman, S\. Anadkat,et al\.\(2023\)Gpt\-4 technical report\.arXiv preprint arXiv:2303\.08774\.Cited by:[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- C\. An, J\. Zhang, M\. Zhong, L\. Li, S\. Gong, Y\. Luo, J\. Xu, and L\. Kong \(2024\)Why does the effective context length of llms fall short?\.arXiv preprint arXiv:2410\.18745\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p3.1)\.
- A\. Asuncion, D\. Newman,et al\.\(2007\)UCI machine learning repository\.Irvine, CA, USA\.Cited by:[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- D\. Bahri, H\. Jiang, Y\. Tay, and D\. Metzler \(2021\)Scarf: self\-supervised contrastive learning using random feature corruption\.arXiv preprint arXiv:2106\.15147\.Cited by:[1st item](https://arxiv.org/html/2606.11640#A2.I1.i1.p1.1),[Table 1](https://arxiv.org/html/2606.11640#S1.T1.1.1.2.2.1),[§1](https://arxiv.org/html/2606.11640#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- S\. Bubeck, V\. Chandrasekaran, R\. Eldan, J\. Gehrke, E\. Horvitz, E\. Kamar, P\. Lee, Y\. T\. Lee, Y\. Li, S\. Lundberg,et al\.\(2023\)Sparks of artificial general intelligence: early experiments with gpt\-4\.arXiv preprint arXiv:2303\.12712\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p2.1)\.
- L\. Cao \(2022\)Ai in finance: challenges, techniques, and opportunities\.ACM Computing Surveys \(CSUR\)55\(3\),pp\. 1–38\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- N\. Carlini, F\. Tramer, E\. Wallace, M\. Jagielski, A\. Herbert\-Voss, K\. Lee, A\. Roberts, T\. Brown, D\. Song, U\. Erlingsson,et al\.\(2021\)Extracting training data from large language models\.In30th USENIX security symposium \(USENIX Security 21\),pp\. 2633–2650\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p3.1)\.
- W\. Chen, Y\. Liu, Z\. Kira, Y\. F\. Wang, and J\. Huang \(2019\)A closer look at few\-shot classification\.arXiv preprint arXiv:1904\.04232\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- J\. M\. Clements, D\. Xu, N\. Yousefi, and D\. Efimov \(2020\)Sequential deep learning for credit risk monitoring with tabular financial data\.arXiv preprint arXiv:2012\.15330\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- E\. Dai, W\. Jin, H\. Liu, and S\. Wang \(2022\)Towards robust graph neural networks for noisy graphs with sparse labels\.InProceedings of the fifteenth ACM international conference on web search and data mining,pp\. 181–191\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p4.1)\.
- J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2019\)Bert: pre\-training of deep bidirectional transformers for language understanding\.InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 \(long and short papers\),pp\. 4171–4186\.Cited by:[§4\.1](https://arxiv.org/html/2606.11640#S4.SS1.p1.10),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- H\. Duan, Y\. Yang, and K\. Y\. Tam \(2024\)Do llms know about hallucination? an empirical investigation of llm’s hidden states\.arXiv preprint arXiv:2402\.09733\.Cited by:[§4\.2](https://arxiv.org/html/2606.11640#S4.SS2.p1.13)\.
- S\.E\. Golovenkin, V\.A\. Shulman, D\.A\. Rossiev, P\.A\. Shesternya, S\.Yu\. Nikulina, Yu\.V\. Orlova, and V\.F\. Voino\-Yasenetsky \(2020\)Myocardial infarction complications\.Note:UCI Machine Learning RepositoryDOI: https://doi\.org/10\.24432/C53P5MCited by:[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- X\. Guo, Y\. Quan, H\. Zhao, Q\. Yao, Y\. Li, and W\. Tu \(2021\)Tabgnn: multiplex graph neural network for tabular data prediction\.arXiv preprint arXiv:2108\.09127\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.11640#S2.SS2.p1.1)\.
- S\. Han, J\. Yoon, S\. O\. Arik, and T\. Pfister \(2024\)Large language models can automatically engineer features for few\-shot tabular learning\.arXiv preprint arXiv:2404\.09491\.Cited by:[4th item](https://arxiv.org/html/2606.11640#A2.I2.i4.p1.1),[Table 1](https://arxiv.org/html/2606.11640#S1.T1.1.1.2.2.7),[§1](https://arxiv.org/html/2606.11640#S1.p1.1),[§1](https://arxiv.org/html/2606.11640#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- S\. Hegselmann, A\. Buendia, H\. Lang, M\. Agrawal, X\. Jiang, and D\. Sontag \(2023\)Tabllm: few\-shot classification of tabular data with large language models\.InInternational Conference on Artificial Intelligence and Statistics,pp\. 5549–5581\.Cited by:[3rd item](https://arxiv.org/html/2606.11640#A2.I2.i3.p1.1),[Table 1](https://arxiv.org/html/2606.11640#S1.T1.1.1.2.2.6),[§1](https://arxiv.org/html/2606.11640#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- N\. Hollmann, S\. Müller, K\. Eggensperger, and F\. Hutter \(2023\)TabPFN: a transformer that solves small tabular classification problems in a second\.InThe Eleventh International Conference on Learning Representations,Cited by:[2nd item](https://arxiv.org/html/2606.11640#A2.I1.i2.p1.1),[Table 1](https://arxiv.org/html/2606.11640#S1.T1.1.1.2.2.2),[§1](https://arxiv.org/html/2606.11640#S1.p2.1),[§1](https://arxiv.org/html/2606.11640#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- Y\. Hou, A\. Pascale, J\. Carnerero\-Cano, T\. Tchrakian, R\. Marinescu, E\. Daly, I\. Padhi, and P\. Sattigeri \(2024\)Wikicontradict: a benchmark for evaluating llms on real\-world knowledge conflicts from wikipedia\.Advances in Neural Information Processing Systems37,pp\. 109701–109747\.Cited by:[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1)\.
- Y\. Hu, T\. Nguyen, S\. Ghosh, and S\. Razniewski \(2025\)Enabling llm knowledge analysis via extensive materialization\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 16189–16202\.Cited by:[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1)\.
- A\. Kadra, M\. Lindauer, F\. Hutter, and J\. Grabocka \(2021\)Well\-tuned simple nets excel on tabular datasets\.Advances in neural information processing systems34,pp\. 23928–23941\.Cited by:[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- S\. Kim, S\. Yun, H\. Lee, M\. Gubri, S\. Yoon, and S\. J\. Oh \(2023\)Propile: probing privacy leakage in large language models\.Advances in Neural Information Processing Systems36,pp\. 20750–20762\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p3.1)\.
- P\. Laban, W\. Kryściński, D\. Agarwal, A\. R\. Fabbri, C\. Xiong, S\. Joty, and C\. Wu \(2023\)SummEdits: measuring llm ability at factual reasoning through the lens of summarization\.InProceedings of the 2023 conference on empirical methods in natural language processing,pp\. 9662–9676\.Cited by:[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1)\.
- Y\. Li, M\. Du, R\. Song, X\. Wang, M\. Sun, and Y\. Wang \(2024\)Mitigating social biases of pre\-trained language models via contrastive self\-debiasing with double data augmentation\.Artificial Intelligence332,pp\. 104143\.Cited by:[§4\.2](https://arxiv.org/html/2606.11640#S4.SS2.p1.13)\.
- J\. C\. Liao and C\. Li \(2023\)TabGSL: graph structure learning for tabular data prediction\.arXiv preprint arXiv:2305\.15843\.Cited by:[1st item](https://arxiv.org/html/2606.11640#A2.I3.i1.p1.1),[§1](https://arxiv.org/html/2606.11640#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.11640#S2.SS2.p1.1),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- R\. Liu, L\. Fang, W\. Wang, and B\. Jing \(2024\)D2R2: diffusion\-based representation with random distance matching for tabular few\-shot learning\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- A\. A\. Mitani and S\. Haneuse \(2020\)Small data challenges of studying rare diseases\.JAMA network open3\(3\),pp\. e201965–e201965\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- M\. R\. H\. Mondal, S\. Bharati, P\. Podder, and P\. Podder \(2020\)Data analytics for novel coronavirus disease\.informatics in medicine unlocked20,pp\. 100374\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- J\. Nam, J\. Tack, K\. Lee, H\. Lee, and J\. Shin \(2023\)STUNT: few\-shot tabular learning with self\-generated tasks from unlabeled tables\.InThe Eleventh International Conference on Learning Representations,Cited by:[3rd item](https://arxiv.org/html/2606.11640#A2.I1.i3.p1.1),[Table 1](https://arxiv.org/html/2606.11640#S1.T1.1.1.2.2.3),[§1](https://arxiv.org/html/2606.11640#S1.p1.1),[§1](https://arxiv.org/html/2606.11640#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- B\. Oreshkin, P\. Rodríguez López, and A\. Lacoste \(2018\)Tadam: task dependent adaptive metric for improved few\-shot learning\.Advances in neural information processing systems31\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- Z\. Peng, Z\. Li, J\. Zhang, Y\. Li, G\. Qi, and J\. Tang \(2019\)Few\-shot image recognition with knowledge transfer\.InProceedings of the IEEE/CVF international conference on computer vision,pp\. 441–449\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- G\. Perković, A\. Drobnjak, and I\. Botički \(2024\)Hallucinations in llms: understanding and addressing challenges\.In2024 47th MIPRO ICT and Electronics Convention \(MIPRO\),pp\. 2084–2088\.Cited by:[§4\.2](https://arxiv.org/html/2606.11640#S4.SS2.p1.13)\.
- G\. E\. Rani, M\. Sakthimohan, M\. Navaneethakrishnan, S\. Mahendran, S\. Dhivya, and M\. Jayaprakash \(2023\)Amazon employee access system using machine learning algorithms\.In2023 International Conference on Intelligent Systems for Communication, IoT and Security \(ICISCoIS\),pp\. 417–421\.Cited by:[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- F\. Seide, G\. Li, X\. Chen, and D\. Yu \(2011\)Feature engineering in context\-dependent deep neural networks for conversational speech transcription\.In2011 IEEE Workshop on Automatic Speech Recognition & Understanding,pp\. 24–29\.Cited by:[§2\.2](https://arxiv.org/html/2606.11640#S2.SS2.p1.1)\.
- K\. Shailaja, B\. Seetharamulu, and M\. Jabbar \(2018\)Machine learning in healthcare: a review\.In2018 Second international conference on electronics, communication and aerospace technology \(ICECA\),pp\. 910–914\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- D\. Slack and S\. Singh \(2023\)Tablet: learning from instructions for tabular data\.arXiv preprint arXiv:2304\.13188\.Cited by:[2nd item](https://arxiv.org/html/2606.11640#A2.I2.i2.p1.1),[Table 1](https://arxiv.org/html/2606.11640#S1.T1.1.1.2.2.5),[§1](https://arxiv.org/html/2606.11640#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- J\. Snell, K\. Swersky, and R\. Zemel \(2017\)Prototypical networks for few\-shot learning\.Advances in neural information processing systems30\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- J\. Vanschoren, J\. N\. Van Rijn, B\. Bischl, and L\. Torgo \(2014\)OpenML: networked science in machine learning\.ACM SIGKDD Explorations Newsletter15\(2\),pp\. 49–60\.Cited by:[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- X\. Wan, R\. Sun, H\. Dai, S\. Arik, and T\. Pfister \(2023\)Better zero\-shot reasoning with self\-adaptive prompting\.InFindings of the Association for Computational Linguistics: ACL 2023,pp\. 3493–3514\.Cited by:[§2\.1](https://arxiv.org/html/2606.11640#S2.SS1.p1.1)\.
- B\. Wang, T\. Shen, G\. Long, T\. Zhou, Y\. Wang, and Y\. Chang \(2021\)Structure\-augmented text representation learning for efficient knowledge graph completion\.InProceedings of the Web Conference 2021,pp\. 1737–1748\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- X\. Wang, M\. Salmani, P\. Omidi, X\. Ren, M\. Rezagholizadeh, and A\. Eshaghi \(2024\)Beyond the limits: a survey of techniques to extend the context length in large language models\.arXiv preprint arXiv:2402\.02244\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p3.1)\.
- Y\. Wang, Q\. Yao, J\. T\. Kwok, and L\. M\. Ni \(2020\)Generalizing from a few examples: a survey on few\-shot learning\.ACM computing surveys \(csur\)53\(3\),pp\. 1–34\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p1.1)\.
- J\. Wei, Y\. Tay, R\. Bommasani, C\. Raffel, B\. Zoph, S\. Borgeaud, D\. Yogatama, M\. Bosma, D\. Zhou, D\. Metzler,et al\.\(2022\)Emergent abilities of large language models\.arXiv preprint arXiv:2206\.07682\.Cited by:[1st item](https://arxiv.org/html/2606.11640#A2.I2.i1.p1.1),[Table 1](https://arxiv.org/html/2606.11640#S1.T1.1.1.2.2.4),[§1](https://arxiv.org/html/2606.11640#S1.p2.1),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- J\. Yan, J\. Chen, Y\. Wu, D\. Z\. Chen, and J\. Wu \(2023\)T2g\-former: organizing tabular features into relation graphs promotes heterogeneous feature interaction\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.37,pp\. 10720–10728\.Cited by:[2nd item](https://arxiv.org/html/2606.11640#A2.I3.i2.p1.1),[§1](https://arxiv.org/html/2606.11640#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.11640#S2.SS2.p1.1),[§4\.2](https://arxiv.org/html/2606.11640#S4.SS2.p1.13),[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.
- Y\. Yao, J\. Duan, K\. Xu, Y\. Cai, Z\. Sun, and Y\. Zhang \(2024\)A survey on large language model \(llm\) security and privacy: the good, the bad, and the ugly\.High\-Confidence Computing4\(2\),pp\. 100211\.Cited by:[§1](https://arxiv.org/html/2606.11640#S1.p2.1)\.
- I\. Yeh, K\. Yang, and T\. Ting \(2009\)Knowledge discovery on rfm model using bernoulli sequence\.Expert Systems with applications36\(3\),pp\. 5866–5871\.Cited by:[§5](https://arxiv.org/html/2606.11640#S5.p1.4)\.

## Appendix AAlgorithm of TAROT

Table 5\.Promptppused by TAROT of Adult dataset\.Meta Informationpm​e​t​ap\_\{meta\}Task objective: Does this person earn more than 50000 dollars per year?
features descriptions: age: the age of an individual, workclass: a general term to represent the employment status of an individual, fnlwgt: the number of units in the target population that the responding unit represents, education: the highest level of education achieved by an individual, educational\-num: the highest level of education achieved in numerical form…InstructionpIp\_\{I\}Step 1\. Analyze all the causal relationships or tendencies between features based on general knowledge and common sense within a short sentence to answer the task\.
Step 1\. Based on the above tabular description and Step 1’s results, analyze the semantic relationships between featuresCode Templatepc​o​d​ep\_\{code\}import numpy as np
feature\_number =nn
A = np\.zeros\(\(feature\_number, feature\_number\), dtype=int\)
\#semantic relationships between features
edges = \[Filled by LLM\]
for i, j in relationships:
A\[i, j\] = A\[j, i\] = 1
np\.save\(graph\_path, A\)Algorithm 1TAROT1:Given a tabular dataset

D=\{𝕏,𝕐\}D=\\\{\\mathbb\{X\},\\mathbb\{Y\}\\\}
2:For

kk\-shot learning, we construct the training set

D^\\hat\{D\}by sampling

kklabeled samples from the dataset

DD\.

3:

hi=USTNE​\(xi,fj\),hi∈H,xi∈D^h\_\{i\}=\\text\{USTNE\}\(x\_\{i\},f\_\{j\}\),\\ \\ h\_\{i\}\\in\\textbf\{H\},x\_\{i\}\\in\\hat\{D\}
4:Use the formula:

A=Execute​\(LLM​\(p\)\)\\textbf\{A\}=\\text\{Execute\}\(\\text\{LLM\}\(p\)\)to obtain the semantic graph structureA, the prompt

ppused are shown in Tab\.[5](https://arxiv.org/html/2606.11640#A1.T5)\.

5:repeat

6:

ev,u=\[\(hv,hu\)⊕\(hv⊙hu\)⊕\(\|hv−hu\|\)\]e\_\{v,u\}=\\big\[\(h\_\{v\},h\_\{u\}\)\\oplus\(h\_\{v\}\\odot h\_\{u\}\)\\oplus\(\|h\_\{v\}\-h\_\{u\}\|\)\\big\]
7:

sv,u=σ​\(ev,u​w\)s\_\{v,u\}=\\sigma\(e\_\{v,u\}w\)
8:Obtain

Ap​r​u​n​e\\textbf\{A\}\_\{prune\}and

Ae​n​h​a​n​c​e\\textbf\{A\}\_\{enhance\}according to Eq\.[9](https://arxiv.org/html/2606.11640#S4.E9)and Eq\.[10](https://arxiv.org/html/2606.11640#S4.E10), respectively

9:

Ar​e​f​i​n​e=\(Ap​r​i​o​r⊙\(1−Ap​r​u​n​e\)\)∨Ae​n​h​a​n​c​e\\textbf\{A\}\_\{refine\}=\(\\textbf\{A\}\_\{prior\}\\odot\(1\-\\textbf\{A\}\_\{prune\}\)\)\\lor\\textbf\{A\}\_\{enhance\}
10:for

l=1,…,Ll=1,\\dots,Ldo

11:

hvl\+1=σ​\(W⋅AGG⁡\(hul:u∈𝒩r​e​f​i​n​e​\(v\)∪v\)\),h\_\{v\}^\{l\+1\}=\\sigma\\Big\(W\\cdot\\operatorname\{AGG\}\(\{h\_\{u\}^\{l\}:u\\in\\mathcal\{N\}\_\{refine\}\(v\)\\cup\{v\}\}\)\\Big\),
12:end for

13:

y^=Linear​\(Mean​\(HL\)\)\\hat\{y\}=\\text\{Linear\}\(\\text\{Mean\}\(H^\{L\}\)\)
14:Calculate the loss according to Eq\.[14](https://arxiv.org/html/2606.11640#S4.E14)

15:Update parameters using Adam optimizer

16:untilconverges

Given a labeled tabular datasetD=𝕏,𝕐D=\{\\mathbb\{X\},\\mathbb\{Y\}\}, we first construct akk\-shot training setD^\\hat\{D\}by samplingkklabeled instances per task \(or per class\)\. For each training instance, TAROT applies the*Unified Semantic Tabular Node Encoder*\(USTNE\) to transform heterogeneous features into unified semantic node representations, producing a node embedding matrixH\. Next, to obtain a semantics\-aware prior structure without accessing any raw samples, we query LLMs with a promptppthat contains task meta\-information, step\-by\-step reasoning instructions, and an executable code template; executing the returned code directly yields the semantic adjacency matrix𝐀\\mathbf\{A\}, which is treated as the prior graph𝐀p​r​i​o​r\\mathbf\{A\}\_\{prior\}\.

During training, TAROT performs*task\-adaptive semantic graph refinement*to mitigate structural noise introduced by the LLM\. Specifically, for each node pair\(v,u\)\(v,u\), we construct an edge representationev,ue\_\{v,u\}by concatenating\(hv,hu\)\(h\_\{v\},h\_\{u\}\), element\-wise product\(hv⊙hu\)\(h\_\{v\}\\odot h\_\{u\}\), and absolute difference\|hv−hu\|\|h\_\{v\}\-h\_\{u\}\|, and obtain a semantic scoresv,u=σ​\(ev,u​w\)s\_\{v,u\}=\\sigma\(e\_\{v,u\}w\)via a linear scorer\. Based on these scores, we derive a pruning mask𝐀p​r​u​n​e\\mathbf\{A\}\_\{prune\}\(removing edges with scores below a thresholdτ\\tauand an enhancement mask𝐀e​n​h​a​n​c​e\\mathbf\{A\}\_\{enhance\}\(adding top\-\(k\) highest\-scoring edges\), and refine the structure by𝐀r​e​f​i​n​e=\(𝐀p​r​i​o​r⊙\(1−𝐀p​r​u​n​e\)\)∨𝐀e​n​h​a​n​c​e\\mathbf\{A\}\_\{refine\}=\(\\mathbf\{A\}\_\{prior\}\\odot\(1\-\\mathbf\{A\}\_\{prune\}\)\)\\lor\\mathbf\{A\}\_\{enhance\}\. We then perform message passing on the refined graph forLLGNN layers using mean aggregation over the refined neighborhood𝒩r​e​f​i​n​e​\(v\)\\mathcal\{N\}\_\{refine\}\(v\), producing𝐇L\\mathbf\{H\}^\{L\}\. The final prediction is obtained by mean\-pooling node embeddings followed by a linear head, i\.e\.,y^=Linear​\(Mean​\(𝐇L\)\)\\hat\{y\}=\\mathrm\{Linear\}\(\\mathrm\{Mean\}\(\\mathbf\{H\}^\{L\}\)\)\. All parameters \(including the edge scorer and GNN\) are optimized end\-to\-end with Adam by minimizingℒ=ℒt​a​s​k\+λ1​ℒp​r​i​o​r\+λ2​ℒs​p​a​r​s​e\\mathcal\{L\}=\\mathcal\{L\}\_\{task\}\+\\lambda\_\{1\}\\mathcal\{L\}\_\{prior\}\+\\lambda\_\{2\}\\mathcal\{L\}\_\{sparse\}, whereℒp​r​i​o​r\\mathcal\{L\}\_\{prior\}regularizes the learned structure toward𝐀p​r​i​o​r\\mathbf\{A\}\_\{prior\}andℒs​p​a​r​s​e\\mathcal\{L\}\_\{sparse\}encourages sparsity in𝐀r​e​f​i​n​e\\mathbf\{A\}\_\{refine\}, improving stability and generalization in few\-shot tabular learning\.

## Appendix BDetailed Experiment Setups

### B\.1\.Datasets

Table 6\.The basic information of each dataset used in our experiments\.We evaluate TAROT on 11 tabular benchmarks spanning both classification and regression, with diverse feature heterogeneity \(mixed categorical and numerical attributes\)\. Specifically, we use 8 classification datasets: Adult \(48,842 samples, 14 features\), Amazon \(2,000, 9\), Blood \(748, 4\), Credit\-g \(1,000, 20\), Diabetes \(768, 8\), Heart \(918, 11\), Communities \(1,994, 103\), and Myocardial \(1,700, 111\); and 3 regression datasets: Abalone \(4,177, 8\), Boston \(506, 13\), and Cholesterol \(303, 9\)\. Overall, dataset sizes range from 303 to 48,842 and feature dimensionalities from 4 to 111, with substantial variation in the categorical/numerical composition\. For example, Amazon contains only categorical features \(9/0\), Blood and Diabetes contain only numerical features \(0/4 and 0/8\), Communities is high\-dimensional and predominantly numerical \(1/102\), whereas Myocardial is categorical\-dominant \(94/17\)\. This diversity in task type, scale, and feature modality enables a comprehensive assessment of few\-shot tabular learning performance\.

### B\.2\.Baselines

We compare TAROT with a broad set of representative baselines for few\-shot tabular learning, covering \(i\) traditional few\-shot tabular learners, \(ii\) LLM\-based few\-shot approaches, and \(iii\) graph\-based tabular learning methods for semantic structure construction\. For a fair comparison, all methods are evaluated under the samekk\-shot protocol \(i\.e\., samplingkklabeled instances to form the support/training set\)\. Whenever official implementations are available, we use them and keep the default hyperparameters; otherwise, we follow the settings recommended in the original papers\.

⊳\\trianglerightTraditional few\-shot tabular learning methods\.These methods improve few\-shot generalization by leveraging transferable inductive biases learned from additional pretraining or self\-supervision on tabular data\.

- •SCARF\(Bahriet al\.,[2021](https://arxiv.org/html/2606.11640#bib.bib22)\)is a self\-supervised contrastive pretraining method that generates two views by randomly corrupting features and learns invariant representations by aligning original and corrupted samples\.
- •TabPFN\(Hollmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib35)\)pretrains a transformer on a large collection of synthetic tabular tasks, enabling strong few\-shot predictions via a single forward pass without dataset\-specific hyperparameter tuning\.
- •STUNT\(Namet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib23)\)is a few\-shot semi\-supervised framework that constructs pseudo\-tasks and leverages self\-training style transfer to improve performance with limited labels\.

⊳\\trianglerightLLM\-based few\-shot tabular learning methods\.These methods serialize tabular samples into natural language \(or structured text\) and prompt LLMs to perform prediction, leveraging the world knowledge and reasoning abilities encoded in LLMs\.

- •In\-context\(Weiet al\.,[2022](https://arxiv.org/html/2606.11640#bib.bib36)\)is a standard prompting baseline that provides a few labeled examples together with the test instance, allowing the LLM to infer the mapping via in\-context learning without parameter updates\.
- •TABLET\(Slack and Singh,[2023](https://arxiv.org/html/2606.11640#bib.bib24)\)improves LLM\-based tabular reasoning by incorporating task\-specific instructions and structured prompts on top of vanilla in\-context learning\.
- •TabLLM\(Hegselmannet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib25)\)adapts LLMs to tabular prediction by fine\-tuning on tabular\-formatted data, enhancing their awareness of feature\-value semantics\.
- •FeatLLM\(Hanet al\.,[2024](https://arxiv.org/html/2606.11640#bib.bib26)\)uses LLMs as feature engineers to identify and filter informative features \(or generate feature rationales\), and then trains downstream predictors on the resulting feature subset for few\-shot learning\.

⊳\\trianglerightGraph\-based tabular learning / graph construction strategies\.To assess the benefit of explicitly modeling semantic feature dependencies, we additionally compare methods that construct feature graphs and perform relational reasoning\.

- •TabGSL\(Liao and Li,[2023](https://arxiv.org/html/2606.11640#bib.bib69)\)learns a graph structure by initializing candidate connections \(e\.g\., via similarity/kNN\) and retaining top\-kkrelations to form a learner\-view graph for message passing\.
- •T2G\-FORMER\(Yanet al\.,[2023](https://arxiv.org/html/2606.11640#bib.bib18)\)computes feature relationship scores to induce a graph structure and models feature dependencies using transformer\-style graph reasoning\.

## Appendix CAdditional Experiments

Table 7\.Missing column name experiments on Diabetes, averaged over 4, 8, and 16 shots\. The missing rate denotes the percentage of feature names/descriptions removed from the metadata\.Analysis of Missing Metadata Robustness\.As shown in Table[7](https://arxiv.org/html/2606.11640#A3.T7), TAROT consistently outperforms FeatLLM under different missing\-metadata rates, that semantic graph provide useful prior knowledge in few\-shot settings\. Although the performance of metadata\-based methods decreases as more feature names/descriptions are removed, TAROT remains more robust than FeatLLM across all settings\. When all metadata is missing, TAROT\_S, which uses statistical dependency signals such as Pearson’sRR, correlation ratio, and Cramér’sVV, achieves 70\.21 AUC and improves TAROT by 2\.47 points\. These results show that dependency\-based auxiliary signals can effectively mitigate the degradation caused by missing semantic information\.

Similar Articles

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

Hugging Face Daily Papers

ART (Art-based Reinforcement Training) enables parameter-efficient fine-tuning of frozen multimodal LLMs by optimizing raw visual input via gradient backpropagation, achieving performance comparable to LoRA while supporting pre-compiled computational graphs for high-throughput engines like vLLM.