program-synthesis

#program-synthesis

Semantic Reification: A new paradigm for random program generation

Lobsters Hottest ↗ · 2d ago Cached

Introduces Semantic Reification, a new paradigm for random program generation, as detailed in this research paper.

0 favorites 0 likes

#program-synthesis

Healthcare Mechanisms from Policy-as-Code Search under Strategic Provider Response

arXiv cs.AI ↗ · 6d ago Cached

This paper reformulates hospital mechanism design as program synthesis for language models, using a multi-agent simulator (Medi-Sim) to evaluate policy rules under strategic provider responses. It demonstrates pressure migration across provider channels and synthesizes an inspectable mixed-objective program that reduces up-coding and rejection while retaining funds.

0 favorites 0 likes

#program-synthesis

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

Reddit r/MachineLearning ↗ · 2026-05-21

RPS is a two-stage LLM post-training method inspired by neuroscience, combining curriculum learning with learning rate decay. Preliminary results show improved program synthesis reliability on Qwen3-8b compared to equal learning rate training.

0 favorites 0 likes

#program-synthesis

Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models

arXiv cs.AI ↗ · 2026-05-19 Cached

Introduces Alice, a closed-loop system that learns executable world models online under prior misalignment by treating failed candidate updates as structural signal, achieving improved performance on a variant of Baba Is You with semantically remapped labels.

0 favorites 0 likes

#program-synthesis

From I/O to Code with Discovery Agent

arXiv cs.LG ↗ · 2026-05-18 Cached

This paper introduces DIO-Agent, a discovery agent that synthesizes programs from input-output behavior using LLM-guided evolutionary search with a transformation priority premise to avoid dead ends. Experiments show it outperforms traditional methods and baselines on a new IO2CodeBench benchmark.

0 favorites 0 likes

#program-synthesis

Property-Guided LLM Program Synthesis for Planning

arXiv cs.AI ↗ · 2026-05-18 Cached

This paper proposes property-guided LLM program synthesis, using counterexample-guided inductive synthesis (CEGIS) to provide concrete feedback when a candidate program fails a formal property, reducing the number of generations and evaluation costs. Applied to PDDL planning domains for synthesizing direct heuristic functions, the method outperforms prior approaches, generating seven times fewer programs and solving more tasks without search.

0 favorites 0 likes

#program-synthesis

Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

arXiv cs.AI ↗ · 2026-05-14 Cached

This paper introduces PyRAG, a framework that reformulates multi-hop retrieval-augmented generation as program synthesis and execution, using executable Python code to represent reasoning steps and enable deterministic feedback and adaptive retrieval.

0 favorites 0 likes

#program-synthesis

ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis

arXiv cs.CL ↗ · 2026-05-08 Cached

ReaComp compiles LLM reasoning traces into reusable symbolic program synthesizers that achieve strong accuracy on program synthesis benchmarks while eliminating LLM calls at test time, significantly reducing computational cost.

0 favorites 0 likes

#program-synthesis

A sufficiently comprehensive spec is not (necessarily) code

Hillel Wayne — Computer Things ↗ · 2026-04-15 Cached

This article argues that a comprehensive specification is not equivalent to code, because a spec defines a set of possible implementations while code is one concrete instance. It discusses the role of abstraction and why programmers are still needed to write specs even with automated code generation.

0 favorites 0 likes

#program-synthesis

Evaluating large language models trained on code

OpenAI Blog ↗ · 2021-07-07 Cached

OpenAI introduces Codex, a GPT model fine-tuned on GitHub code, achieving 28.8% functional correctness on HumanEval (a new benchmark for code synthesis from docstrings), significantly outperforming GPT-3 (0%) and GPT-J (11.4%). The paper demonstrates that repeated sampling improves performance to 70.2% with 100 samples, and discusses limitations and broader impacts of code generation systems.

0 favorites 0 likes

program-synthesis

Submit Feedback