Tag
Introduces Semantic Reification, a new paradigm for random program generation, as detailed in this research paper.
This paper reformulates hospital mechanism design as program synthesis for language models, using a multi-agent simulator (Medi-Sim) to evaluate policy rules under strategic provider responses. It demonstrates pressure migration across provider channels and synthesizes an inspectable mixed-objective program that reduces up-coding and rejection while retaining funds.
RPS is a two-stage LLM post-training method inspired by neuroscience, combining curriculum learning with learning rate decay. Preliminary results show improved program synthesis reliability on Qwen3-8b compared to equal learning rate training.
Introduces Alice, a closed-loop system that learns executable world models online under prior misalignment by treating failed candidate updates as structural signal, achieving improved performance on a variant of Baba Is You with semantically remapped labels.
This paper introduces DIO-Agent, a discovery agent that synthesizes programs from input-output behavior using LLM-guided evolutionary search with a transformation priority premise to avoid dead ends. Experiments show it outperforms traditional methods and baselines on a new IO2CodeBench benchmark.
This paper proposes property-guided LLM program synthesis, using counterexample-guided inductive synthesis (CEGIS) to provide concrete feedback when a candidate program fails a formal property, reducing the number of generations and evaluation costs. Applied to PDDL planning domains for synthesizing direct heuristic functions, the method outperforms prior approaches, generating seven times fewer programs and solving more tasks without search.
This paper introduces PyRAG, a framework that reformulates multi-hop retrieval-augmented generation as program synthesis and execution, using executable Python code to represent reasoning steps and enable deterministic feedback and adaptive retrieval.
ReaComp compiles LLM reasoning traces into reusable symbolic program synthesizers that achieve strong accuracy on program synthesis benchmarks while eliminating LLM calls at test time, significantly reducing computational cost.
This article argues that a comprehensive specification is not equivalent to code, because a spec defines a set of possible implementations while code is one concrete instance. It discusses the role of abstraction and why programmers are still needed to write specs even with automated code generation.
OpenAI introduces Codex, a GPT model fine-tuned on GitHub code, achieving 28.8% functional correctness on HumanEval (a new benchmark for code synthesis from docstrings), significantly outperforming GPT-3 (0%) and GPT-J (11.4%). The paper demonstrates that repeated sampling improves performance to 70.2% with 100 samples, and discusses limitations and broader impacts of code generation systems.