R^3-SQL: Ranking Reward and Resampling for Text-to-SQL
摘要
# Paper page - R^3-SQL: Ranking Reward and Resampling for Text-to-SQL Source: [https://huggingface.co/papers/2604.25325](https://huggingface.co/papers/2604.25325) ## Abstract R$^3$\-SQL addresses inconsistencies in scoring functionally equivalent SQL queries and improves candidate recall through unified reward ranking and agentic resampling techniques\. Modern[Text\-to\-SQL](https://huggingface.co/papers?q=Text-to-SQL)systems generate multiple candidate[SQL queries](https://huggingface.co/papers
查看缓存全文
缓存时间: 2026/05/11 10:44
Paper page - R^3-SQL: Ranking Reward and Resampling for Text-to-SQL
Source: https://huggingface.co/papers/2604.25325
Abstract
R^3-SQL addresses inconsistencies in scoring functionally equivalent SQL queries and improves candidate recall through unified reward ranking and agentic resampling techniques.
ModernText-to-SQLsystems generate multiple candidateSQL queriesand rank them to judge a final prediction. However, existing methods face two limitations. First, they often score functionally equivalentSQL queriesinconsistently despite identical execution results. Second, ranking cannot recover when the correct SQL is absent from the candidate pool. We propose R^3-SQL, aText-to-SQLframework that addresses both issues through unified reward for ranking and resampling. R^3-SQL first groups candidates by execution result and ranks groups for consistency. To score each group, it combines apairwise preferenceacross groups with apointwise utilityfrom the best group rank and size, capturing relative preference, consistency, and candidate quality. To improve candidate recall, R^3-SQL introducesagentic resampling, which judges the generated candidate pool and selectively resamples when the correct SQL is likely absent. R^3-SQL achieves 75.03execution accuracyon BIRD-dev, a new state of the art among methods using models with disclosed sizes, with consistent gains across five benchmarks.
View arXiv pageView PDFAdd to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.25325 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.25325 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.25325 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
相似文章
@omarsar0: 这篇论文很好地结合了 Skills 与 RAG 的优势。大多数 RAG 系统会在每次查询时都进行检索,无论模型是否需要……
该研究提出了 Skill-RAG,一种将 Skills 与检索增强生成(RAG)相结合的新方法,以解决传统 RAG 系统无论模型是否确实需要信息都会在每次查询时进行检索所带来的低效问题。
SCURank:利用摘要内容单元对多个候选摘要进行排序,提升摘要质量
SCURank 引入“摘要内容单元”对候选摘要打分,使从多个大模型蒸馏出的小模型超越传统指标与单一模型蒸馏效果。
Caraman 在 SemEval-2026 任务 8 中的表现:采用查询重写、混合检索和交叉编码器重排的多轮检索三阶段方法
本文介绍了一个用于 SemEval-2026 任务 8 的系统,该系统采用三阶段流水线,包括使用微调后的 Qwen 模型进行查询重写、混合检索以及交叉编码器重排,以提升多轮检索的性能。
从自适应列表排序角度重新审视自适应检索增强生成的必要性
本文提出了 AdaRankLLM,一个自适应检索框架,通过列表排序动态过滤检索到的段落,对自适应 RAG 的必要性提出质疑。研究表明自适应检索对于较弱模型充当噪声过滤器,对于更强模型充当成本效率优化器,在多个数据集和 LLM 上进行了广泛实验。
评估失效的缩放定律:为何简单平均在数据稀疏和题目难度差距下会崩溃,以及项目反应理论如何跨领域恢复真实情况
本文指出,在数据稀疏和难度异构的情况下,AI基准测试中的简单平均法会失效,并提出项目反应理论(IRT)作为一种稳健的替代方案,以恢复真实的排名情况。