Tag
This paper introduces Program-based Posterior Training (PPT), a method that uses LLM-generated probabilistic programs to create distributional targets for fine-tuning inductive reasoning, improving estimation accuracy and calibration on held-out tasks and human-alignment benchmarks.
FalsifyBench is a new evaluation framework for assessing inductive reasoning in LLMs, inspired by the Wason 2-4-6 task, where agents discover hidden semantic rules by proposing examples and receiving feedback. Evaluation of 12 LLMs shows reasoning models outperform instruction-tuned models, with negative testing (hypothesis falsification) being the key driver of success.
Proposes KMAS, an adaptive negative sampling method to improve training of knowledge graph foundation models, achieving state-of-the-art results across 44 datasets.
MIND-Skill is a new framework introduced in this research paper that automates the generation of high-quality, reusable agent skills using multi-agent induction and deduction with quality guarantees via TextGrad optimization.