Project Ariadne: Prompt-Conditioned Route Generation for Synthesis Planning
Summary
Ariadne is a decoder-only route generator for retrosynthetic planning that frames the target, optional constraints, and route as a prompt-completion sequence, achieving superior performance with much less computation compared to traditional search-based planners.
View Cached Full Text
Cached at: 06/24/26, 07:50 AM
# Project Ariadne: Prompt-Conditioned Route Generation for Synthesis Planning Source: [https://arxiv.org/html/2606.24184](https://arxiv.org/html/2606.24184) Victor Batista Yale University victor\.batista@yale\.edu ###### Abstract Retrosynthetic planning seeks to connect a target molecule to commercially available starting materials through a multistep route\. Classical planners construct such routes by iteratively applying single\-step reaction models within a search procedure; constrained variants often require specialized algorithms or architectural changes\. Direct route generation reframes retrosynthesis as sequence generation, but existing direct\-generation methods still train separate models for different planning specifications\. We introduce Ariadne, a decoder\-only route generator that represents the target, optional constraints, and route in one prompt\-completion sequence\. On the RetroCast/PaRoutesmkt\-cnv\-160benchmark family, one 24\-layer checkpoint follows route\-depth and required\-starting\-material prompts: adding the corresponding prompt fields raises Solv\-0 by 13\.7 points for depth constraints and 31\.2 points for required\-leaf constraints\. Ariadne also improves over DESP, a bidirectional search planner, on required\-leaf Top\-10 and Solv\-0 in 24 GPU\-minutes versus 6\.8 GPU\-hours\. On standard reconstruction, Ariadne is comparable to DMS Explorer XL at about half the reported inference time\. Across additional target\-only benchmarks, Ariadne’s clearest gains are on route\-holdout reconstruction, whereas AiZynthFinder MCTS remains stronger on several Solv\-0 comparisons\. These results extend sequence generation from specialist retrosynthesis models to prompt\-conditioned structural route generation\. We release the[codebase and training scripts](https://github.com/ischemist/project-ariadne)to support further work, but do not introduce Tier\-1–3 route checkers; those remain the main bottleneck before models of this kind can become useful to experimental chemists\. ## 1Introduction Machine learning is expected to significantly accelerate, if not revolutionize, the often decades\-long and billion\-dollar process of drug discovery\. A persistent bottleneck during hit\-to\-lead and lead optimization stages is a simple question: can this molecule be easily made?\[[7](https://arxiv.org/html/2606.24184#bib.bib69)\]While significant effort has been put into attempts to answer that question directly by training synthetic accessibility predictors\[[11](https://arxiv.org/html/2606.24184#bib.bib19),[8](https://arxiv.org/html/2606.24184#bib.bib20),[48](https://arxiv.org/html/2606.24184#bib.bib21),[53](https://arxiv.org/html/2606.24184#bib.bib22)\], an emerging consensus is that the only truly reliable measure of synthesizability is the explicit construction of a synthesis plan connecting a desired target to a set of commercially available building blocks\[[34](https://arxiv.org/html/2606.24184#bib.bib65),[33](https://arxiv.org/html/2606.24184#bib.bib18)\]\. This synthesis plan can be constructed in either direction: by starting from building blocks through synthesis\-aware forward design\[[23](https://arxiv.org/html/2606.24184#bib.bib68),[20](https://arxiv.org/html/2606.24184#bib.bib66),[41](https://arxiv.org/html/2606.24184#bib.bib67),[30](https://arxiv.org/html/2606.24184#bib.bib70)\], or by applying retrosynthetic analysis to the target molecule\[[9](https://arxiv.org/html/2606.24184#bib.bib23)\]\. The prevalent approach to multistep retrosynthetic planning is built from two components: asingle\-step reaction predictorthat is applied iteratively to the target molecule and resulting precursor candidates, and asearch algorithmprioritizing the most promising branches of the resulting search space\[[40](https://arxiv.org/html/2606.24184#bib.bib24),[22](https://arxiv.org/html/2606.24184#bib.bib28),[38](https://arxiv.org/html/2606.24184#bib.bib53),[56](https://arxiv.org/html/2606.24184#bib.bib52),[6](https://arxiv.org/html/2606.24184#bib.bib25),[14](https://arxiv.org/html/2606.24184#bib.bib26),[62](https://arxiv.org/html/2606.24184#bib.bib29),[57](https://arxiv.org/html/2606.24184#bib.bib30),[18](https://arxiv.org/html/2606.24184#bib.bib31),[27](https://arxiv.org/html/2606.24184#bib.bib48),[65](https://arxiv.org/html/2606.24184#bib.bib57),[66](https://arxiv.org/html/2606.24184#bib.bib32),[50](https://arxiv.org/html/2606.24184#bib.bib44),[16](https://arxiv.org/html/2606.24184#bib.bib50),[5](https://arxiv.org/html/2606.24184#bib.bib49),[64](https://arxiv.org/html/2606.24184#bib.bib59),[36](https://arxiv.org/html/2606.24184#bib.bib33),[1](https://arxiv.org/html/2606.24184#bib.bib34),[55](https://arxiv.org/html/2606.24184#bib.bib35)\]\. Hybrid systems keep explicit search but add learned, retrieval\-based, or language\-model guidance to steer expansion and pruning\[[19](https://arxiv.org/html/2606.24184#bib.bib54),[37](https://arxiv.org/html/2606.24184#bib.bib55),[26](https://arxiv.org/html/2606.24184#bib.bib39),[61](https://arxiv.org/html/2606.24184#bib.bib40),[31](https://arxiv.org/html/2606.24184#bib.bib41),[4](https://arxiv.org/html/2606.24184#bib.bib42),[60](https://arxiv.org/html/2606.24184#bib.bib43),[44](https://arxiv.org/html/2606.24184#bib.bib45),[12](https://arxiv.org/html/2606.24184#bib.bib61)\]\. An emerging alternative is direct generation of the synthesis plan represented as a single string\[[39](https://arxiv.org/html/2606.24184#bib.bib27),[25](https://arxiv.org/html/2606.24184#bib.bib36),[24](https://arxiv.org/html/2606.24184#bib.bib56),[43](https://arxiv.org/html/2606.24184#bib.bib1),[2](https://arxiv.org/html/2606.24184#bib.bib62),[46](https://arxiv.org/html/2606.24184#bib.bib37),[59](https://arxiv.org/html/2606.24184#bib.bib46),[54](https://arxiv.org/html/2606.24184#bib.bib47),[15](https://arxiv.org/html/2606.24184#bib.bib38)\]\. For example,[Sheeet al\.](https://arxiv.org/html/2606.24184#bib.bib1)trained a series of encoder\-decoder transformers to "translate" a SMILES specification of the target compound into a stringified \(via depth\-first search\) representation of the multistep route\. These DirectMultiStep models were also extended to constrained versions of retrosynthetic planning, such as finding a route with a specified starting\-material structure or desired route depth, but each such problem required training a specialist model\. In this work, we extend the DirectMultiStep sequence formulation from separately trained encoder\-decoder models to a single decoder\-only task language for route generation\. Ariadne represents the target, optional planning constraints, and route in one sequence, so the same checkpoint can be queried with different task specifications at inference time\. As a proof of concept, we study target\-only reconstruction together with route\-depth and required starting\-material prompts\. We evaluate these outputs within the existing Solv\-N and RetroCast framework, using route reconstruction and constraint\-aware Solv\-0 to test whether generated routes satisfy benchmark specifications\[[33](https://arxiv.org/html/2606.24184#bib.bib18),[13](https://arxiv.org/html/2606.24184#bib.bib64),[32](https://arxiv.org/html/2606.24184#bib.bib2)\]\. These metrics evaluate the structural route plan: the reaction topology, stock termination, and prompt\-specified constraints\. Direct experimental use would require additional quantitative planning layers, such as reaction plausibility assessment, condition prediction, procedure generation, and higher\-tier executability checks discussed in the Solv\-N framework\[[33](https://arxiv.org/html/2606.24184#bib.bib18)\]\. ## 2Preliminaries ### 2\.1Definitions A retrosyntheticrouteis a sequence ofreactionsworking backward from atarget moleculeto a set of proposedstarting materials\(orleaves\)\. All models discussed herein inherit ambiguities from their patent\-derived training data, which may not distinguish corereactantsfrom auxiliaryreagentsand may omit reaction conditions \(e\.g\., solvent, temperature\)\. A predicted route is therefore not a complete experimental protocol but a high\-level topological plan, the validity of which rests on an unevaluated assumption that viable conditions exist for each transformation\. ### 2\.2Evaluation We distinguish thegeneration prompt, which is supplied to Ariadne before decoding, from thescoring task, which defines the constraints used by RetroCast during evaluation\. This distinction lets us ask, for example, how target\-only generations score under the stricter required\-leaf task, or how required\-leaf prompts behave when scored under the standard target\-only task\. We report two complementary metric sets\. First, we report Tier\-0 validity, which is the share of targets that have at least one route where all reactions are Tier\-0 valid, and Solv\-0, which is the share of targets that have at least one Tier\-0\-valid route that satisfies the scoring task constraints\[[33](https://arxiv.org/html/2606.24184#bib.bib18)\]\. Formkt\-cnv\-160, the scoring task constraint is simply termination in the ASKCOS Buyables stock of commercially available compounds\[[51](https://arxiv.org/html/2606.24184#bib.bib76),[36](https://arxiv.org/html/2606.24184#bib.bib33)\]\. Formkt\-cnv\-160\-leaf, the scoring task constraint is stock termination together with the presence of a specified starting material among the leaves\. Formkt\-cnv\-160\-depth, it is stock termination together with the requested route depth\. Additional target\-only benchmarks use the same RetroCast convention:mkt\-benchmarks are scored with ASKCOS Buyables, whereasref\-benchmarks are scored with the patent\-derived PaRoutes stocks distributed with the benchmark definitions\. In the absence of established Tier\-1–3 validity checking protocols, and following the proposed separation of method development from introduction of new evaluation metrics\[[32](https://arxiv.org/html/2606.24184#bib.bib2),[33](https://arxiv.org/html/2606.24184#bib.bib18)\], we report benchmark route reconstruction as a proxy metric of route quality\. We use the standard RetroCast implementation of scoring and report Top\-KKaccuracy, that is, whether a reference route was produced within the firstKKcandidates\. ### 2\.3Data Representation Ariadne is a decoder\-only transformer trained on stringified representations of synthesis planning tasks\. Each training example is a rooted S\-expression \(see Fig\.[1](https://arxiv.org/html/2606.24184#S2.F1)\) with two parts: a problem specification and the route that solves it: > \(task \(spec \.\.\.\) \(route \.\.\.\)\) wherespeccontains the prompt\-side information androutecontains the target route tree\. Thespeccontains a target molecule represented as\(query \(mol \.\.\.\)\)and any optional constraints\. A route is represented recursively\. A leaf node is written as\(leaf \(mol \.\.\.\)\)\. A reaction node is written as\(reaction \(mol \.\.\.\) \(children \.\.\.\)\), where the children are precursor routes\. Figure 1:Data representation shift from DirectMultiStep to Ariadne\.\(a\)DirectMultiStep treats multistep retrosynthesis as sequence translation from target molecule \(and optionally appended constraints\) to a synthesis plan\.\(b\)The skeletal structure of the route encoded in \(a\) and \(c\)\.\(c\)Ariadne represents the same problem as one decoder\-only task sequence containing both the prompt\-side specification and the route\-side answer\.The same route can be converted into different training sequences by changing only thespecblock\. In the simplest target\-only mode, denotedT, the specification contains only the target molecule\. InTL, it also containsroute\_depth\. InTSd, it contains one required starting material, chosen from the deepest leaf of the route\. InTLSd, it contains both route depth and that required starting material\. During training we also generateTSeandTLSesequences in which the required\-leaf field is instantiated once for each route leaf\. Hereddenotes the deepest leaf andedenotes enumeration over leaves; theTSdandTLSdevaluation prompts select the deepest\-leaf instance from the corresponding enumerated training variants\. The field is namedrequired\_leavesbecause the representation supports multiple required starting materials, but all constrained experiments in this work use one required starting material\. These variants let one model see the same route distribution under different amounts of information and naturally offset underrepresentation of longer routes: a longer route typically has more leaves and therefore contributes more leaf\-conditioned sequences\. Training data also augments task sequences by permuting sibling order in the route section\. We generate three deterministic permutations: one that permutes children recursively, one that permutes only the root children, and one that permutes the deepest branching node\. We tokenize this representation with a small S\-expression\-aware tokenizer\. Parentheses and structural labels such astask,spec,query,route,reaction,leaf,children,route\_depth, andrequired\_leavesare atomic tokens\. SMILES are then split character\-wise\. During training, the prompt portion is masked out of the loss; the model is trained to generate the route side of the sequence conditioned on the specification\. ### 2\.4Training Data All Ariadne models were trained on thev2026\-05\-12canonical split of the PaRoutes dataset preprocessed with RetroCast\. RetroCast provides two versions: arouteholdout, which guarantees that no route from the test set is represented as\-is in the training set, and areactionholdout, which guarantees that no test\-route reaction appears in any training route after RetroCast canonicalization\.v2026\-05\-12\-routeis equivalent to the training set used in the original DirectMultiStep work\.v2026\-05\-12\-reactionfollows the stricter filtering proposed by[Xuan\-Vuet al\.](https://arxiv.org/html/2606.24184#bib.bib46), who argued that route\-based filtering leads to unfair data leakage\. ## 3Results and Discussion DirectMultiStep showed that target\-only and constrained retrosynthesis can be written as sequence\-to\-sequence problems, but each task variant \(e\.g\. unidirectional and bidirectional search\) required a separate model\. Ariadne pushes that idea one step further: the target and optional constraints are prompt fields, and a single route generator completes the prompt with a synthesis tree\. Themkt\-cnv\-160benchmark family gives a controlled test of this interface because it keeps the same 160 targets while changing only the route constraints\. The base task requires stock termination in ASKCOS Buyables; the depth variant additionally fixes route depth, and the leaf variant additionally fixes one starting material that must appear among the route leaves\. ### 3\.1One checkpoint handles multiple planning specifications Table[1](https://arxiv.org/html/2606.24184#S3.T1)isolates constraint following by holding the Ariadne checkpoint and generation procedure fixed while changing the generation prompt\. The target\-onlyTrows generate unconstrained candidates and score them under the stricter depth or leaf scoring tasks, giving the baseline rate at which generation already satisfies the added requirement\. The constrained rows include the corresponding field in the generation prompt before decoding\. On the depth benchmark, adding the requested depth with theTLprompt raises Solv\-0 from 76\.9% to 90\.6%\. On the required\-leaf benchmark, adding the benchmark\-specified required starting material with theTSdprompt raises Solv\-0 from 50\.0% to 81\.2% and Top\-10 reconstruction from 26\.2% to 37\.5%\. The required\-leaf benchmark also gives a direct comparison to DESP\[[61](https://arxiv.org/html/2606.24184#bib.bib40)\], a bidirectional search planner built for target\-plus\-starting\-material constraints\. With theTSdprompt, Ariadne reaches 37\.5% Top\-10 and 81\.2% Solv\-0 in 24 GPU\-minutes\. The best DESP setting reaches 17\.5% Top\-10 and 71\.2% Solv\-0 in 6\.8 GPU\-hours\. A paired bootstrap comparison on Top\-10 gives Ariadne a \+20\.0\-point advantage, with 95% CI \[10\.6, 28\.8\]\. Ariadne therefore improves both Top\-10 and Solv\-0 while using about17×17\\timesless GPU time\. Table 1:Constraint\-aware evaluation on themkt\-cnv\-160\-depthandmkt\-cnv\-160\-leafscoring tasks\. Ariadne rows use the 24\-layer model trained on thev2026\-05\-12\-reactionsplit at the 14B\-token checkpoint with beam size 50\.Tis target\-only prompting,TLadds the requested route depth, andTSdadds the benchmark\-specified required starting material to the generation prompt\. Top\-10 CI gives the 95% bootstrap interval\. ### 3\.2The unified model preserves standard route reconstruction We also evaluate whether the unified planner remains competitive on the standard target\-only task\. Table[2](https://arxiv.org/html/2606.24184#S3.T2)reports public SynthArena baselines together with local runs on the matchedv2026\-05\-12training splits\. On the stricter reaction holdout, Ariadne 24L is comparable to retrained MCTS on route reconstruction: 22\.5% versus 19\.4% Top\-1 and 42\.5% versus 35\.0% Top\-10 \(paired bootstrap 95% CI for the Top\-10 difference: \[\-0\.6, 15\.6\]\)\. MCTS remains stronger on Solv\-0 \(92\.5% versus 81\.2%\) and faster at inference \(10\.9 versus 19\.4 minutes\)\. On the route holdout, which is the closest comparison to the original DirectMultiStep split, Ariadne 24L \(15\.8 M parameters\) reaches 38\.1% Top\-1 and 59\.4% Top\-10, which is comparable to the DMS Explorer XL \(50 M parameters\) Top\-10 result of 57\.5% while reducing generation time from 47\.6 to 23\.9 minutes\. Table 2:mkt\-cnv\-160Top\-KKreconstruction and Solv\-0 \(stock termination\) results\. Public AiZynthFinder MCTS and DMS Explorer XL values are taken from the[SynthArena leaderboard](https://syntharena.ischemist.com/leaderboard?benchmarkId=cmisc0flu0000boddjstwifeo)\. Ariadne 12L rows use the 20B\-token checkpoint, and Ariadne 24L rows use the 14B\-token checkpoint\. AiZynthFinder runs use 100 MCTS iterations and maximum search depth 6\. Top\-10 CI gives the 95% bootstrap interval for newly evaluated rows\. ### 3\.3Route and reaction holdouts separate chemistry generalization from planning ability Table[2](https://arxiv.org/html/2606.24184#S3.T2)shows how the same planners behave on the reaction and route holdouts, giving two complementary views of reaction coverage and route assembly\. The reaction holdout removes every reaction from the benchmark routes, so success is a stricter proxy for both route planning and generalization beyond the exact single\-step reactions present in training\. The route holdout removes exact benchmark routes but can leave their component reactions \(if they’re present in other routes\) in training, giving a cleaner read on whether a planner can assemble covered transformations into the reference route\. Moving from the reaction holdout to the route holdout, MCTS Top\-10 rises modestly from 35\.0% to 40\.6%, whereas Ariadne 24L rises from 42\.5% to 59\.4%\. On the route holdout itself, the paired Top\-10 comparison gives Ariadne a \+18\.8\-point advantage over MCTS, with 95% CI \[10\.6, 26\.9\]\. Onmkt\-cnv\-160, these comparisons indicate that additional single\-step reaction coverage translates into larger route\-level gains for Ariadne than for MCTS\. The same trend appears in Solv\-0, which rises from 92\.5% to 93\.1% for MCTS and from 81\.2% to 95\.6% for Ariadne 24L\. The additional commercial\-stock benchmark gives the same picture: onmkt\-lin\-500, Ariadne improves Top\-10 on both holdouts and all route\-holdout reconstruction metrics, while MCTS remains a strong Solv\-0 baseline \(Table[S1](https://arxiv.org/html/2606.24184#Sx2.T1)\)\. ### 3\.4Reference\-stock benchmarks expose stock\-conditioning limits A current limitation of Ariadne is that generation cannot yet be conditioned on an arbitrary user\-specified stock set\. Such stock control is central to practical planning, where the useful terminal set may include in\-house building blocks in addition to commercially available compounds\. Search\-based planners can enforce this constraint during expansion; the present Ariadne model can only be filtered against the desired stock after generation\. Theref\-\*benchmarks expose this limitation by replacing ASKCOS Buyables with PaRoutes reference stocks:ref\-cnv\-400andref\-lin\-600usen5, andref\-lng\-84uses combinedn1/n5stocks \(Tables[S2](https://arxiv.org/html/2606.24184#Sx2.T2)–[S4](https://arxiv.org/html/2606.24184#Sx2.T4)\)\. On route holdouts, Ariadne reaches 68\.2% Solv\-0 onref\-cnv\-400and 74\.8% onref\-lin\-600, below the MCTS values of 85\.2% and 83\.2%\. The reaction holdouts widen the gap: 33\.0% versus 76\.5% onref\-cnv\-400, and 55\.8% versus 78\.8% onref\-lin\-600\. Although Ariadne remains competitive on route\-holdout Top\-10 reconstruction, these results indicate that arbitrary\-stock conditioning is an important area for future research\. ### 3\.5Larger models recover deeper routes more often Table[3](https://arxiv.org/html/2606.24184#S3.T3)shows that scaling Ariadne improves reconstruction of deeper routes, especially on the route holdout\. On the reaction holdout, deeper routes remain difficult for every method: MCTS falls from 57\.5% at depth 2 to 17\.5% at depth 5, and Ariadne 24L falls from 67\.5% to 20\.0%\. On the route holdout, the profile changes more substantially for Ariadne than for MCTS\. MCTS still declines with depth, while Ariadne 24L reaches 45\.0% Top\-10 at depth 4 and 52\.5% at depth 5\. Within the limits of the 40\-target strata, this is consistent with additional scale helping Ariadne turn better reaction coverage into longer\-range route reconstruction\. Table 3:mkt\-cnv\-160Top\-10 route reconstruction accuracy by reference route depth for local runs\. Ariadne 12L rows use the 20B\-token checkpoint, and Ariadne 24L rows use the 14B\-token checkpoint\. Each depth stratum contains 40 targets\. Values are reported as mean with 95% bootstrap interval\. ### 3\.6The first disconnection is the main bottleneck Table[4](https://arxiv.org/html/2606.24184#S3.T4)separates recovery of the root reaction from recovery of the full route conditional on getting the root right\. For MCTS, these quantities are similar; on the route split, both root Top\-10 androute∣\\midrootTop\-10 are 63\.7%\. In contrast, for Ariadneroute∣\\midrootTop\-10 is consistently higher than root Top\-10, with the largest gaps on the reaction split and for smaller models\. For example, Ariadne 12L on the reaction split reaches 46\.9% root Top\-10 but 68\.0%route∣\\midrootTop\-10, indicating that the main limitation is choosing the first disconnection rather than completing the route once that choice is correct\. Table 4:mkt\-cnv\-160reconstruction diagnostics\. Ariadne 12L rows use the 20B\-token checkpoint, and Ariadne 24L rows use the 14B\-token checkpoint\. Root columns measure recovery of the reference root reaction at the indicatedKK\. Route∣\\midroot columns measure full\-route recovery among targets whose root reaction was recovered at the sameKK\. Prefix columns report Top\-10 reconstruction of reference route prefixes of depth 1, 2, and 3\. Mean distinct roots is computed over the Top\-10 candidates\. ### 3\.7Beam search mainly improves root selection To interpret the beam\-50 results used above, we sweep beam width onmkt\-lin\-500and ask whether larger beams improve route completion itself or mainly increase the chance of sampling the correct first disconnection \(Table[S5](https://arxiv.org/html/2606.24184#Sx2.T5)\)\. Increasing the beam from 1 to 50 raises root Top\-10 from 17\.4% to 63\.0% on the reaction holdout and from 22\.0% to 72\.4% on the route holdout\. Conditional route recovery changes much less:route∣\\midrootTop\-10 remains between 57\.5% and 64\.4% on the reaction holdout and between 70\.1% and 72\.7% on the route holdout\. The gains in Solv\-0 and Top\-10 reconstruction therefore come mainly from improved root selection; on the reaction holdout, Solv\-0 increases from 33\.8% to 90\.4% and Top\-10 increases from 10\.0% to 37\.2%\. Prompt fields that improve root selection, or dynamic beam allocation based on target complexity or search progress, may reduce decoding time while preserving high Solv\-0\. Developing such decoding strategies is an area for future work\. ### 3\.8Prefix\-LM attention does not improve prompted generation DirectMultiStep used an encoder\-decoder architecture, so the route decoder attended to a bidirectional representation of the complete task specification\. This makes it natural to ask whether Ariadne should recover that property with Prefix\-LM attention\[[10](https://arxiv.org/html/2606.24184#bib.bib78)\], which allows bidirectional attention within the prompt while keeping route generation causal\. Table[5](https://arxiv.org/html/2606.24184#S3.T5)shows no overall benefit from restoring bidirectional attention within the prompt\. In the 24\-layerT\-TL\-TLSe\-TSecomparison, fully causal attention gives higher Top\-1, Top\-10, and root\-recovery values for every evaluated prompt\. It also gives higherroute∣\\midrootTop\-10 for all three evaluated prompts\. We therefore use the simpler fully causal attention mask for Ariadne\. Table 5:mkt\-cnv\-160attention\-mask ablation using standard stock\-termination scoring\. Causal models use ordinary left\-to\-right attention for all tokens\. Prefix models allow bidirectional attention within the generation prompt while keeping route generation causal\. HereTLandTSdadd prompt fields before decoding, but RetroCast scoring still uses the standard stock\-termination task rather than the constrained depth or leaf scoring tasks in Table[1](https://arxiv.org/html/2606.24184#S3.T1)\. Top\-10 CI gives the 95% bootstrap interval\. For the causalTLandTSdrows, Solv\-0 coincides with the corresponding constrained rows in Table[1](https://arxiv.org/html/2606.24184#S3.T1)because the stock\-terminated successes also satisfy the prompted constraint\. ## 4Conclusion Ariadne shows that direct multistep route generation can be formulated as prompt\-conditioned decoding rather than as a collection of separately trained specialist models\. In one 24\-layer checkpoint, the same target/constraint/route language supports target\-only reconstruction, route\-depth prompts, and required\-starting\-material prompts\. On themkt\-cnv\-160benchmark family, this checkpoint follows the added constraint fields, remains competitive with DirectMultiStep Explorer XL on standard reconstruction, and exceeds the evaluated DESP settings on required\-leaf Top\-10 and Solv\-0 while using 17x less GPU time\. Whenever route termination depends on in\-house compounds rather than off\-the\-shelf buyables, a planner must be able to condition on an arbitrary stock set\. Theref\-\*benchmarks instantiate this setting by replacing ASKCOS Buyables with custom stocks derived from reference\-route leaves\. MCTS receives those stocks during search and correspondingly reaches 85\.2% Solv\-0 on theref\-cnv\-400route holdout\. Because the present Ariadne model can only be filtered against those stocks after decoding, its Solv\-0 on the same task is only 68\.2%\. To understand why Ariadne is competitive on reference\-route reconstruction, we report diagnostics that separate recovery of the first disconnection from recovery of the rest of the route\. These diagnostics show that Ariadne often reconstructs the rest of the route correctly once the first disconnection is recovered\. The beam\-width sweep onmkt\-lin\-500points to the same mechanism: increasing beam width mainly increases the probability of recovering that first disconnection, which then raises root\-reaction Top\-10, Solv\-0, and Top\-10 reconstruction\. Together with the stock\-conditioning results above, this identifies stock\-aware prompting and more reliable first\-disconnection selection as important areas for further research\. ## 5Outlook This work should not be interpreted as an attempt to replace MCTS or explicit search more generally\. Rather, in the spirit of the bitter lesson\[[47](https://arxiv.org/html/2606.24184#bib.bib63),[33](https://arxiv.org/html/2606.24184#bib.bib18)\], explicit search and direct generation appear to offer complementary scaling axes: search provides direct control over stock termination and other hard constraints, while learned generators can amortize recurring route structure into faster prompt\-conditioned inference\. One natural next step is therefore to use search to generate constraint\-satisfying trajectories and distill those trajectories into direct generators\. For Ariadne specifically, the immediate model\-side extensions are arbitrary\-stock prompting, prompt fields or decoding policies that improve root selection, dynamic beam allocation based on target complexity or search progress, and speculative decoding\[[2](https://arxiv.org/html/2606.24184#bib.bib62)\]\. On the evaluation side, Solv\-0 and Top\-KKreconstruction evaluate structural route plans under Tier\-0 checks, so the field needs standardized Tier\-1–3 validation protocols before performance on these benchmarks can be translated into claims about experimentally executable synthesis\. We release the codebase and full training scripts so that Ariadne can serve as a reproducible baseline for these directions\. ## 6Implementation Details ### 6\.1Model In addition to switching from an encoder\-decoder setup in DirectMultiStep to a decoder\-only architecture for Ariadne, we also updated the implementation of transformer blocks \(see Table[6](https://arxiv.org/html/2606.24184#S6.T6)\)\. The implemented model is a pre\-normalization decoder with token embeddings, RMSNorm, rotary position embeddings, multi\-head self\-attention, and SwiGLU feed\-forward blocks\. The runs reported here use dense feed\-forward layers\. The code also supports sparse mixture\-of\-experts blocks, but those are not used for the main results\. We vary model scale mainly by layer count while holding hidden size at 256 and using eight attention heads\. This results in 7\.9 M parameters for the 12 layer model and 15\.8 M for the 24 layer model\. For reference, DMS Explorer XL has 50 M parameters\[[43](https://arxiv.org/html/2606.24184#bib.bib1)\]\. Table 6:Main implementation differences between DirectMultiStep and Ariadne\. ### 6\.2Training The supervised objective is next\-token cross entropy on the route part of the sequence\. Prompt tokens and padding tokens are masked out of the labels\. Training uses Hugging Face Accelerate for device placement, gradient accumulation, checkpointing, and mixed precision\. The main runs use one GPU per run with bfloat16 autocast on supported accelerators\. Dense two\-dimensional transformer weights are optimized with Muon using Moonshot update scaling\. Embeddings, output head, normalization parameters, and other residual parameters use AdamW\-style groups\. Gradients are clipped to unit norm\. Learning\-rate schedules, logging, validation, checkpointing, and budget accounting are token\-based rather than epoch\-based\. We use length bucketing to reduce computational costs associated with excessive padding, resulting in a roughly12×12\\timesspeedup\. ### 6\.3Generation Generation uses deterministic batched beam search with a KV cache\[[35](https://arxiv.org/html/2606.24184#bib.bib17)\]\. The prompt is the task specification plus the start of the route wrapper and ends immediately after thechildrentoken in that incomplete wrapper\. The prompt is run once, cached key/value states are expanded across beams, and subsequent decoding steps feed only the newest token\. After each beam update, the cache is reordered to match the surviving beams\. All reported Ariadne generations use beam size 50, length penalty 0\.5, and a 1200\-token generation limit; decoding stops earlier when all beams emit<eos\>\. ### 6\.4Evaluation We use the standard RetroCast v0\.7\.x implementations of Solv\-N and route reconstruction scoring defined in Preliminaries\[[32](https://arxiv.org/html/2606.24184#bib.bib2)\]\. Ariadne supplies raw generated candidate routes, but parsing, canonicalization, constraint filtering, duplicate removal, and Top\-KKreconstruction are all performed by the RetroCast scoring pipeline\[[32](https://arxiv.org/html/2606.24184#bib.bib2)\]\. Failed parses are preserved as failed candidate slots, so they fail Tier\-0 validity and cannot contribute to Solv\-0 or reconstruction\. DMS Explorer XL results in Table[2](https://arxiv.org/html/2606.24184#S3.T2)are reprinted from RetroCast/SynthArena, where sequence\-based DirectMultiStep runs were performed on Lambda Labs NVIDIA A100 40 GB GPUs\[[32](https://arxiv.org/html/2606.24184#bib.bib2)\]\. For comparability, Ariadne and DESP evaluations in this work were run on separate clean single\-GPU Lambda Labs A100 40 GB instances\. The MCTS rows use AWS EC2c7i\.xlargeCPU instances, matching the RetroCast/SynthArena protocol for search\-based planners\[[32](https://arxiv.org/html/2606.24184#bib.bib2)\]\. Reported times are planning or generation wall\-clock times for the corresponding planner runs\. ## 7Data and Software Availability Code for processing the dataset, implementing the model architecture, and running training, generation, and evaluation is available under the MIT License at[https://github\.com/ischemist/project\-ariadne](https://github.com/ischemist/project-ariadne)\. ## Conflict of Interest The authors declare no conflict of interest\. ## 8Acknowledgments The authors acknowledge a generous allocation of high\-performance computing time from NERSC\. The development of the methodology was supported by the NSF CCI grant \(VSB, Award Number 2124511\)\. This research was also supported in part by Lambda, Inc\. ## References - \[1\]T\. Akhmetshin, D\. Zankov, P\. Gantzer, D\. Babadeev, A\. Pinigina, T\. Madzhidov, and A\. Varnek\(2025\)SynPlanner: an end\-to\-end tool for synthesis planning\.Journal of Chemical Information and Modeling65\(1\),pp\. 15–21\.External Links:[Document](https://dx.doi.org/10.1021/acs.jcim.4c02004)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[2\]N\. Andronova, M\. Andronov, J\. Schmidhuber, M\. Wand, and D\. Clevert\(2026\)Fast and scalable retrosynthetic planning with a transformer neural network and speculative beam search\.Digital Discovery5,pp\. 1783–1793\.External Links:[Document](https://dx.doi.org/10.1039/D5DD00573F),[Link](http://dx.doi.org/10.1039/D5DD00573F)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1),[§5](https://arxiv.org/html/2606.24184#S5.p1.1)\. - \[3\]J\. L\. Ba, J\. R\. Kiros, and G\. E\. Hinton\(2016\)Layer normalization\.External Links:1607\.06450,[Link](https://arxiv.org/abs/1607.06450)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.4.3.2.1.1)\. - \[4\]F\. N\. Baker, D\. Adu\-Ampratwum, R\. Averly, B\. Yu, H\. Sun, and X\. Ning\(2026\)LARC: towards human\-level constrained retrosynthesis planning through an agentic framework\.InProceedings of AI for Accelerated Research Symposium,EPiC Series in Technology, Vol\.3,pp\. 153–176\.External Links:[Document](https://dx.doi.org/10.29007/z3hb),[Link](https://easychair.org/publications/paper/SMVW)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[5\]T\. M\. Blackshaw, J\. C\. Davies, K\. T\. Spoerer, and J\. D\. Hirst\(2025\)Enhancing Monte Carlo Tree Search for retrosynthesis\.Journal of Chemical Information and Modeling65,pp\. 6537 – 6546\.External Links:[Link](https://api.semanticscholar.org/CorpusID:279328860)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[6\]B\. Chen, C\. Li, H\. Dai, and L\. Song\(2020\)Retro\*: learning retrosynthetic planning with neural guided A\* search\.InProceedings of the 37th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.119,pp\. 1608–1616\.External Links:[Link](https://proceedings.mlr.press/v119/chen20k.html)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[7\]C\. W\. Coley, P\. Daga, M\. D\. Vivo, W\. Jespers, A\. S\. Jogalekar, S\. R\. Kimura, L\. Koenekoop, A\. Märtson, T\. R\. Newhouse, S\. Ray, R\. Sabatini, D\. C\. Thompson, and W\. Sherman\(2026\)Grand challenges for predictive modeling in small molecule drug discovery\.ChemRxiv2026\(0304\),pp\.\.External Links:[Document](https://dx.doi.org/10.26434/chemrxiv.15000615/v1),[Link](https://chemrxiv.org/doi/abs/10.26434/chemrxiv.15000615/v1),https://chemrxiv\.org/doi/pdf/10\.26434/chemrxiv\.15000615/v1Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[8\]C\. W\. Coley, L\. Rogers, W\. H\. Green, and K\. F\. Jensen\(2018\)SCScore: synthetic complexity learned from a reaction corpus\.Journal of Chemical Information and Modeling58\(2\),pp\. 252–261\.Note:PMID: 29309147External Links:[Document](https://dx.doi.org/10.1021/acs.jcim.7b00622),[Link](https://doi.org/10.1021/acs.jcim.7b00622),https://doi\.org/10\.1021/acs\.jcim\.7b00622Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[9\]E\. J\. Corey and W\. T\. Wipke\(1969\)Computer\-assisted design of complex organic syntheses: pathways for molecular synthesis can be devised with a computer and equipment for graphical communication\.\.Science166\(3902\),pp\. 178–192\.External Links:ISSN 0036\-8075, 1095\-9203,[Link](https://www.science.org/doi/10.1126/science.166.3902.178),[Document](https://dx.doi.org/10.1126/science.166.3902.178)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[10\]L\. Dong, N\. Yang, W\. Wang, F\. Wei, X\. Liu, Y\. Wang, J\. Gao, M\. Zhou, and H\. Hon\(2019\)Unified language model pre\-training for natural language understanding and generation\.InAdvances in Neural Information Processing Systems,Vol\.32\.External Links:[Link](https://proceedings.neurips.cc/paper/2019/hash/c20bb2d9a50d5ac1f713f8b34d9aac5a-Abstract.html)Cited by:[§3\.8](https://arxiv.org/html/2606.24184#S3.SS8.p1.1)\. - \[11\]P\. Ertl and A\. Schuffenhauer\(2009\)Estimation of synthetic accessibility score of drug\-like molecules based on molecular complexity and fragment contributions\.Journal of Cheminformatics1\(1\),pp\. 8\.External Links:ISSN 1758\-2946,[Document](https://dx.doi.org/10.1186/1758-2946-1-8),[Link](https://doi.org/10.1186/1758-2946-1-8)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[12\]P\. Gaiński, M\. Koziarski, K\. Maziarz, M\. Segler, J\. Tabor, and M\. Śmieja\(2025\)Diverse and feasible retrosynthesis using GFlowNets\.Information Sciences714,pp\. 122194\.External Links:[Document](https://dx.doi.org/10.1016/j.ins.2025.122194),[Link](https://doi.org/10.1016/j.ins.2025.122194)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[13\]S\. Genheden and E\. Bjerrum\(2022\)PaRoutes: towards a framework for benchmarking retrosynthesis route predictions\.Digital Discovery1,pp\. 527–539\.External Links:[Document](https://dx.doi.org/10.1039/D2DD00015F),[Link](http://dx.doi.org/10.1039/D2DD00015F)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p3.1)\. - \[14\]S\. Genheden, A\. Thakkar, V\. Chadimová, J\. Reymond, O\. Engkvist, and E\. Bjerrum\(2020\)AiZynthFinder: a fast, robust and flexible open\-source software for retrosynthetic planning\.Journal of Cheminformatics12\(1\),pp\. 70\.External Links:ISSN 1758\-2946,[Document](https://dx.doi.org/10.1186/s13321-020-00472-1),[Link](https://doi.org/10.1186/s13321-020-00472-1)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[15\]E\. Granqvist, R\. Mercado, and S\. Genheden\(2026\)Retrosynformer: planning multi\-step chemical synthesis routes via a decision transformer\.Digital Discovery5,pp\. 348–362\.External Links:[Document](https://dx.doi.org/10.1039/D5DD00153F),[Link](http://dx.doi.org/10.1039/D5DD00153F)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[16\]J\. Guo, C\. Yu, K\. Li, Y\. Zhang, G\. Wang, S\. Li, and H\. Dong\(2024\)Retrosynthesis zero: self\-improving global synthesis planning using reinforcement learning\.\.Journal of chemical theory and computation\.External Links:[Link](https://api.semanticscholar.org/CorpusID:269771006)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[17\]D\. Hendrycks and K\. Gimpel\(2016\)Gaussian error linear units \(gelus\)\.External Links:1606\.08415,[Link](https://arxiv.org/abs/1606.08415)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.6.5.2.1.1)\. - \[18\]S\. Hong, H\. H\. Zhuo, K\. Jin, G\. Shao, and Z\. Zhou\(2023\)Retrosynthetic planning with experience\-guided monte carlo tree search\.Communications Chemistry6\(1\),pp\. 120\.External Links:[Document](https://dx.doi.org/10.1038/s42004-023-00911-8),[Link](https://doi.org/10.1038/s42004-023-00911-8)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[19\]S\. Ishida, K\. Terayama, R\. Kojima, K\. Takasu, and Y\. Okuno\(2022\)AI\-driven synthetic route design incorporated with retrosynthesis knowledge\.Journal of Chemical Information and Modeling62\(6\),pp\. 1357–1367\.External Links:[Document](https://dx.doi.org/10.1021/acs.jcim.1c01074)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[20\]Z\. Jocys, Z\. Zhu, H\. M\. G\. Willems, and K\. Farrahi\(2026\)SynthFormer: equivariant pharmacophore\-based generation of synthesizable molecules for ligand\-based drug design\.Artificial Intelligence in the Life Sciences9,pp\. 100148\.External Links:[Document](https://dx.doi.org/10.1016/j.ailsci.2025.100148),[Link](https://doi.org/10.1016/j.ailsci.2025.100148)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[21\]K\. Jordan, Y\. Jin, V\. Boza, J\. You, F\. Cesista, L\. Newhouse, and J\. Bernstein\(2024\)Muon: an optimizer for hidden layers in neural networks\.External Links:[Link](https://kellerjordan.github.io/posts/muon/)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.7.6.3.1.1)\. - \[22\]A\. Kishimoto, B\. Buesser, B\. Chen, and A\. Botea\(2019\)Depth\-first proof\-number search with heuristic edge cost and application to chemical synthesis planning\.InAdvances in Neural Information Processing Systems,H\. Wallach, H\. Larochelle, A\. Beygelzimer, F\. d'Alché\-Buc, E\. Fox, and R\. Garnett \(Eds\.\),Vol\.32,pp\.\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2019/file/4fc28b7093b135c21c7183ac07e928a6-Paper.pdf)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[23\]M\. Koziarski, A\. Rekesh, D\. Shevchuk, A\. van der Sloot, P\. Gaiński, Y\. Bengio, C\. Liu, M\. Tyers, and R\. A\. Batey\(2024\)RGFN: synthesizable molecular generation using GFlowNets\.InAdvances in Neural Information Processing Systems,Vol\.37,pp\. 46908–46955\.External Links:[Document](https://dx.doi.org/10.52202/079017-1488),[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/53704142f230054140418ecd8857f391-Abstract-Conference.html)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[24\]D\. Kreutter and J\. Reymond\(2023\)Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search\.Chemical Science14,pp\. 9959 – 9969\.External Links:[Link](https://api.semanticscholar.org/CorpusID:261488982)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[25\]K\. Lin, Y\. Xu, J\. Pei, and L\. Lai\(2020\)Automatic retrosynthetic route planning using template\-free models\.Chemical Science11,pp\. 3355 – 3364\.External Links:[Link](https://api.semanticscholar.org/CorpusID:268816571)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[26\]G\. Liu, M\. Sun, W\. Matusik, M\. Jiang, and J\. Chen\(2024\)Multimodal large language models for inverse molecular design with retrosynthetic planning\.External Links:2410\.04223,[Link](https://arxiv.org/abs/2410.04223)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[27\]G\. Liu, D\. Xue, S\. Xie, Y\. Xia, A\. Tripp, K\. Maziarz, M\. H\. S\. Segler, T\. Qin, Z\. Zhang, and T\. Liu\(2023\)Retrosynthetic planning with dual value networks\.InInternational Conference on Machine Learning,External Links:[Link](https://api.semanticscholar.org/CorpusID:256416110)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[28\]J\. Liu, J\. Su, X\. Yao, Z\. Jiang, G\. Lai, Y\. Du, Y\. Qin, W\. Xu, E\. Lu, J\. Yan, Y\. Chen, H\. Zheng, Y\. Liu, S\. Liu, B\. Yin, W\. He, H\. Zhu, Y\. Wang, J\. Wang, M\. Dong, Z\. Zhang, Y\. Kang, H\. Zhang, X\. Xu, Y\. Zhang, Y\. Wu, X\. Zhou, and Z\. Yang\(2025\)Muon is scalable for LLM training\.External Links:2502\.16982,[Link](https://arxiv.org/abs/2502.16982)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.7.6.3.1.1)\. - \[29\]I\. Loshchilov and F\. Hutter\(2019\)Decoupled weight decay regularization\.In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6\-9, 2019,External Links:[Link](https://openreview.net/forum?id=Bkg6RiCqY7)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.7.6.2.1.1),[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.7.6.3.1.1)\. - \[30\]S\. Luo and C\. W\. Coley\(2025\)Efficient and programmable exploration of synthesizable chemical space\.External Links:2512\.00384,[Link](https://arxiv.org/abs/2512.00384)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[31\]K\. Maziarz, G\. Liu, H\. Misztela, A\. Tripp, J\. Li, A\. Kornev, P\. Gaiński, H\. Hoefling, M\. Fortunato, R\. Gupta, and M\. Segler\(2025\)Chemist\-aligned retrosynthesis by ensembling diverse inductive bias models\.External Links:2412\.05269,[Link](https://arxiv.org/abs/2412.05269)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[32\]A\. Morgunov and V\. S\. Batista\(2025\)Procrustean bed for AI\-driven retrosynthesis: a unified framework for reproducible evaluation\.External Links:2512\.07079,[Link](https://arxiv.org/abs/2512.07079)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.24184#S2.SS2.p3.2),[§6\.4](https://arxiv.org/html/2606.24184#S6.SS4.p1.1),[§6\.4](https://arxiv.org/html/2606.24184#S6.SS4.p2.1)\. - \[33\]A\. Morgunov, Y\. Shee, A\. V\. Soudackov, and V\. S\. Batista\(2026\)The syntax of matter: synthesis planning as the foundation of generative chemistry\.ChemRxiv2026\(0421\),pp\.\.External Links:[Document](https://dx.doi.org/10.26434/chemrxiv.15001278/v3),[Link](https://chemrxiv.org/doi/abs/10.26434/chemrxiv.15001278/v3),https://chemrxiv\.org/doi/pdf/10\.26434/chemrxiv\.15001278/v3Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1),[§1](https://arxiv.org/html/2606.24184#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.24184#S2.SS2.p2.1),[§2\.2](https://arxiv.org/html/2606.24184#S2.SS2.p3.2),[§5](https://arxiv.org/html/2606.24184#S5.p1.1)\. - \[34\]M\. Parrot, H\. Tajmouati, V\. B\. R\. da Silva, B\. R\. Atwood, R\. Fourcade, Y\. Gaston\-Mathé, N\. D\. Huu, and Q\. Perron\(2021\)Integrating synthetic accessibility with AI\-based generative drug design\.Journal of Cheminformatics15\.External Links:[Link](https://api.semanticscholar.org/CorpusID:245417923)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[35\]R\. Pope, S\. Douglas, A\. Chowdhery, J\. Devlin, J\. Bradbury, A\. Levskaya, J\. Heek, K\. Xiao, S\. Agrawal, and J\. Dean\(2022\)Efficiently scaling transformer inference\.CoRRabs/2211\.05102\.External Links:[Link](https://doi.org/10.48550/arXiv.2211.05102),[Document](https://dx.doi.org/10.48550/ARXIV.2211.05102),2211\.05102Cited by:[§6\.3](https://arxiv.org/html/2606.24184#S6.SS3.p1.1)\. - \[36\]J\. Roh, J\. F\. Joung, K\. Yu, Z\. Tu, G\. L\. Bartholomew, O\. A\. Santiago\-Reyes, M\. H\. Fong, R\. Sarpong, S\. E\. Reisman, and C\. W\. Coley\(2026\)Higher\-level strategies for computer\-aided retrosynthesis\.ACS Central Science12\(3\),pp\. 345–357\.External Links:[Document](https://dx.doi.org/10.1021/acscentsci.5c02014),[Link](https://doi.org/10.1021/acscentsci.5c02014)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.24184#S2.SS2.p2.1)\. - \[37\]M\. Roucairol and T\. Cazenave\(2024\)Comparing search algorithms on the retrosynthesis problem\.Molecular Informatics43\.External Links:[Link](https://api.semanticscholar.org/CorpusID:253882602)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[38\]J\. S\. Schreck, C\. W\. Coley, and K\. J\. M\. Bishop\(2019\)Learning retrosynthetic planning through simulated experience\.ACS Central Science5\(6\),pp\. 970–981\.External Links:[Document](https://dx.doi.org/10.1021/acscentsci.9b00055)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[39\]P\. Schwaller, R\. Petraglia, V\. Zullo, V\. H\. Nair, R\. Häuselmann, R\. Pisoni, C\. Bekas, A\. Iuliano, and T\. Laino\(2020\)Predicting retrosynthetic pathways using transformer\-based models and a hyper\-graph exploration strategy\.Chemical Science11,pp\. 3316 – 3325\.External Links:[Link](https://api.semanticscholar.org/CorpusID:216332642)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[40\]M\. H\. S\. Segler, M\. Preuss, and M\. P\. Waller\(2018\)Planning chemical syntheses with deep neural networks and symbolic AI\.Nature555\(7698\),pp\. 604–610\.External Links:ISSN 1476\-4687,[Document](https://dx.doi.org/10.1038/nature25978),[Link](https://doi.org/10.1038/nature25978)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[41\]S\. Seo, M\. Kim, T\. Shen, M\. Ester, J\. Park, S\. Ahn, and W\. Y\. Kim\(2025\)Generative flows on synthetic pathway for drug design\.External Links:2410\.04542,[Link](https://arxiv.org/abs/2410.04542)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[42\]N\. Shazeer\(2020\)GLU variants improve transformer\.External Links:2002\.05202,[Link](https://arxiv.org/abs/2002.05202)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.6.5.3.1.1)\. - \[43\]Y\. Shee, A\. Morgunov, H\. Li, and V\. S\. Batista\(2025\)DirectMultiStep: direct route generation for multistep retrosynthesis\.Journal of Chemical Information and Modeling65\(8\),pp\. 3903–3914\.External Links:[Document](https://dx.doi.org/10.1021/acs.jcim.4c01982)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1),[§6\.1](https://arxiv.org/html/2606.24184#S6.SS1.p2.1),[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.2.1.2.1.1)\. - \[44\]X\. Song, X\. Pan, X\. Zhao, H\. Ye, S\. Zhang, J\. Tang, and T\. Yu\(2025\)AOT\*: efficient synthesis planning via LLM\-empowered AND\-OR tree search\.External Links:2509\.20988,[Link](https://arxiv.org/abs/2509.20988)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[45\]J\. Su, M\. Ahmed, Y\. Lu, S\. Pan, W\. Bo, and Y\. Liu\(2024\)RoFormer: enhanced transformer with rotary position embedding\.Neurocomputing568,pp\. 127063\.External Links:[Document](https://dx.doi.org/10.1016/j.neucom.2023.127063),[Link](https://doi.org/10.1016/j.neucom.2023.127063)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.3.2.3.1.1)\. - \[46\]K\. Sun, D\. Bagni, J\. M\. Cavanagh, Y\. Wang, J\. M\. Sawyer, B\. Zhou, A\. Gritsevskiy, O\. Zhang, and T\. Head\-Gordon\(2025\)SynLlama: generating synthesizable molecules and their analogs with large language models\.ACS Central Science11\(11\),pp\. 2108–2120\.External Links:[Document](https://dx.doi.org/10.1021/acscentsci.5c01285)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[47\]R\. S\. Sutton\(2019\)The bitter lesson\.External Links:[Link](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)Cited by:[§5](https://arxiv.org/html/2606.24184#S5.p1.1)\. - \[48\]A\. Thakkar, V\. Chadimová, E\. J\. Bjerrum, O\. Engkvist, and J\. Reymond\(2020\)Retrosynthetic accessibility score \(RAscore\) – rapid machine learned synthesizability classification from AI driven retrosynthetic planning\.Chemical Science12,pp\. 3339 – 3349\.External Links:[Link](https://api.semanticscholar.org/CorpusID:233621461)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[49\]H\. Touvron, T\. Lavril, G\. Izacard, X\. Martinet, M\. Lachaux, T\. Lacroix, B\. Rozière, N\. Goyal, E\. Hambro, F\. Azhar, A\. Rodriguez, A\. Joulin, E\. Grave, and G\. Lample\(2023\)LLaMA: open and efficient foundation language models\.External Links:2302\.13971,[Link](https://arxiv.org/abs/2302.13971)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.6.5.3.1.1)\. - \[50\]A\. Tripp, K\. Maziarz, S\. Lewis, M\. Segler, and J\. M\. Hernández\-Lobato\(2024\)Retro\-fallback: retrosynthetic planning in an uncertain world\.External Links:2310\.09270,[Link](https://arxiv.org/abs/2310.09270)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[51\]Z\. Tu, S\. J\. Choure, M\. H\. Fong, J\. Roh, I\. Levin, K\. Yu, J\. F\. Joung, N\. Morgan, S\. Li, X\. Sun, H\. Lin, M\. Murnin, J\. P\. Liles, T\. J\. Struble, M\. E\. Fortunato, M\. Liu, W\. H\. Green, K\. F\. Jensen, and C\. W\. Coley\(2025\)ASKCOS: open\-source, data\-driven synthesis planning\.Accounts of Chemical Research58\(11\),pp\. 1764–1775\.External Links:[Document](https://dx.doi.org/10.1021/acs.accounts.5c00155)Cited by:[§2\.2](https://arxiv.org/html/2606.24184#S2.SS2.p2.1)\. - \[52\]A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin\(2017\)Attention is all you need\.InAdvances in Neural Information Processing Systems,I\. Guyon, U\. V\. Luxburg, S\. Bengio, H\. Wallach, R\. Fergus, S\. Vishwanathan, and R\. Garnett \(Eds\.\),Vol\.30,pp\.\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.2.1.2.1.1),[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.3.2.2.1.1),[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.5.4.2.1.1),[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.6.5.2.1.1)\. - \[53\]M\. Voršilák, M\. Kolář, I\. Čmelo, and D\. Svozil\(2020\)SYBA: bayesian estimation of synthetic accessibility of organic compounds\.Journal of Cheminformatics12\(1\),pp\. 35\.External Links:ISSN 1758\-2946,[Document](https://dx.doi.org/10.1186/s13321-020-00439-2),[Link](https://doi.org/10.1186/s13321-020-00439-2)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p1.1)\. - \[54\]H\. Wang, J\. Guo, L\. Kong, R\. Ramprasad, P\. Schwaller, Y\. Du, and C\. Zhang\(2025\)LLM\-augmented chemical synthesis and design decision programs\.External Links:2505\.07027,[Link](https://arxiv.org/abs/2505.07027)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[55\]M\. Wang and G\. Montana\(2025\)Retrosynthesis planning via worst\-path policy optimisation in tree\-structured mdps\.External Links:2509\.10504,[Link](https://arxiv.org/abs/2509.10504)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[56\]X\. Wang, Y\. Qian, H\. Gao, C\. W\. Coley, Y\. Mo, R\. Barzilay, and K\. F\. Jensen\(2020\)Towards efficient discovery of green synthetic pathways with monte carlo tree search and reinforcement learning\.Chem\. Sci\.11,pp\. 10959–10972\.External Links:[Document](https://dx.doi.org/10.1039/D0SC04184J),[Link](http://dx.doi.org/10.1039/D0SC04184J)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[57\]S\. Xie, R\. Yan, P\. Han, Y\. Xia, L\. Wu, C\. Guo, B\. Yang, and T\. Qin\(2022\)RetroGraph: retrosynthetic planning with graph search\.InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,KDD ’22,New York, NY, USA,pp\. 2120–2129\.External Links:ISBN 9781450393850,[Link](https://doi.org/10.1145/3534678.3539446),[Document](https://dx.doi.org/10.1145/3534678.3539446)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[58\]R\. Xiong, Y\. Yang, D\. He, K\. Zheng, S\. Zheng, C\. Xing, H\. Zhang, Y\. Lan, L\. Wang, and T\. Liu\(2020\)On layer normalization in the transformer architecture\.InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13\-18 July 2020, Virtual Event,Proceedings of Machine Learning Research,pp\. 10524–10533\.External Links:[Link](http://proceedings.mlr.press/v119/xiong20b.html)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.5.4.3.1.1)\. - \[59\]N\. Xuan\-Vu, D\. P\. Armstrong, Z\. Jončev, and P\. Schwaller\(2025\)TempRe: template generation for single and direct multi\-step retrosynthesis\.External Links:2507\.21762,[Link](https://arxiv.org/abs/2507.21762)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1),[§2\.4](https://arxiv.org/html/2606.24184#S2.SS4.p1.1)\. - \[60\]N\. Xuan\-Vu, D\. Armstrong, M\. Wehrbach, A\. M\. Bran, Z\. Jončev, and P\. Schwaller\(2025\)Synthelite: chemist\-aligned and feasibility\-aware synthesis planning with LLMs\.External Links:2512\.16424,[Link](https://arxiv.org/abs/2512.16424)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[61\]K\. Yu, J\. Roh, Z\. Li, W\. Gao, R\. Wang, and C\. W\. Coley\(2024\)Double\-ended synthesis planning with goal\-constrained bidirectional search\.InAdvances in Neural Information Processing Systems,Vol\.37,pp\. 112919–112949\.External Links:[Document](https://dx.doi.org/10.52202/079017-3588),[Link](https://proceedings.neurips.cc/paper_files/paper/2024/hash/cd091a4d8e97157d32940428f902c7b0-Abstract-Conference.html)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1),[§3\.1](https://arxiv.org/html/2606.24184#S3.SS1.p2.1)\. - \[62\]Y\. Yu, Y\. Wei, K\. Kuang, Z\. Huang, H\. Yao, and F\. Wu\(2022\)GRASP: navigating retrosynthetic planning with goal\-driven policy\.InAdvances in Neural Information Processing Systems,S\. Koyejo, S\. Mohamed, A\. Agarwal, D\. Belgrave, K\. Cho, and A\. Oh \(Eds\.\),Vol\.35,pp\. 10257–10268\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2022/file/42beaab8aa8da1c77581609a61eced93-Paper-Conference.pdf)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[63\]B\. Zhang and R\. Sennrich\(2019\)Root mean square layer normalization\.InAdvances in Neural Information Processing Systems,H\. Wallach, H\. Larochelle, A\. Beygelzimer, F\. d'Alché\-Buc, E\. Fox, and R\. Garnett \(Eds\.\),Vol\.32,pp\.\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2019/file/1e8a19426224ca89e83cef47f1e7f53b-Paper.pdf)Cited by:[Table 6](https://arxiv.org/html/2606.24184#S6.T6.1.4.3.3.1.1)\. - \[64\]X\. Zhang, H\. Lin, M\. Zhang, Y\. Zhou, and J\. Ma\(2025\)A data\-driven group retrosynthesis planning model inspired by neurosymbolic programming\.Nature Communications16\(1\),pp\. 192\.External Links:[Document](https://dx.doi.org/10.1038/s41467-024-55374-9),[Link](https://doi.org/10.1038/s41467-024-55374-9)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[65\]Y\. Zhang, X\. He, S\. Gao, A\. Zhou, and H\. Hao\(2023\)Evolutionary retrosynthetic route planning \[research frontier\]\.IEEE Computational Intelligence Magazine19,pp\. 58–72\.External Links:[Link](https://api.semanticscholar.org/CorpusID:271115363)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. - \[66\]D\. Zhao, S\. Tu, and L\. Xu\(2024\)Efficient retrosynthetic planning with MCTS exploration enhanced A\* search\.Communications Chemistry7\.External Links:[Link](https://api.semanticscholar.org/CorpusID:268252759)Cited by:[§1](https://arxiv.org/html/2606.24184#S1.p2.1)\. ## Supporting Information The Supporting Information reports additional target\-only evaluations that contextualize the mainmkt\-cnv\-160results\. Tables[S1](https://arxiv.org/html/2606.24184#Sx2.T1)–[S4](https://arxiv.org/html/2606.24184#Sx2.T4)extend the Ariadne versus AiZynthFinder MCTS comparison tomkt\-lin\-500,ref\-cnv\-400,ref\-lin\-600, andref\-lng\-84, with paired confidence intervals for Solv\-0, Top\-1, Top\-10, root recovery, and route recovery conditional on the root\. Table[S5](https://arxiv.org/html/2606.24184#Sx2.T5)reports themkt\-lin\-500beam\-size sweep for the 24\-layer Ariadne checkpoint\. #### Supplementary Tables S1–S4 These tables report the full target\-only benchmark suite beyondmkt\-cnv\-160\. Table[S1](https://arxiv.org/html/2606.24184#Sx2.T1)evaluates the commercial\-stock linear benchmark\. Tables[S2](https://arxiv.org/html/2606.24184#Sx2.T2),[S3](https://arxiv.org/html/2606.24184#Sx2.T3), and[S4](https://arxiv.org/html/2606.24184#Sx2.T4)evaluate the reference\-stock convergent, linear, and long\-route benchmarks\. #### Supplementary Table S5 Table[S5](https://arxiv.org/html/2606.24184#Sx2.T5)reports the effect of beam size onmkt\-lin\-500for the 24\-layer Ariadne checkpoint\. Table S1:Additionalmkt\-lin\-500results\. Ariadne uses the 24\-layer checkpoint with target\-only prompting and beam size 50\. AiZynthFinder uses 100 MCTS iterations and maximum search depth 10\. Top\-10 CI gives the 95% bootstrap interval\. Root T10 measures recovery of the reference root reaction, and Route∣\\midroot T10 measures full\-route recovery among targets whose root reaction was recovered\. Paired CI rows compare Ariadne with AiZynthFinder MCTS: intervals containing zero indicate no significant difference, positive intervals favor Ariadne, and negative intervals favor AiZynthFinder MCTS\.Table S2:Additionalref\-cnv\-400results\. Ariadne uses the 24\-layer checkpoint with target\-only prompting and beam size 50\. AiZynthFinder uses 100 MCTS iterations and maximum search depth 10\. Top\-10 CI gives the 95% bootstrap interval\. Root T10 measures recovery of the reference root reaction, and Route∣\\midroot T10 measures full\-route recovery among targets whose root reaction was recovered\. Paired CI rows compare Ariadne with AiZynthFinder MCTS: intervals containing zero indicate no significant difference, positive intervals favor Ariadne, and negative intervals favor AiZynthFinder MCTS\.Table S3:Additionalref\-lin\-600results\. Ariadne uses the 24\-layer checkpoint with target\-only prompting and beam size 50\. AiZynthFinder uses 100 MCTS iterations and maximum search depth 10\. Top\-10 CI gives the 95% bootstrap interval\. Root T10 measures recovery of the reference root reaction, and Route∣\\midroot T10 measures full\-route recovery among targets whose root reaction was recovered\. Paired CI rows compare Ariadne with AiZynthFinder MCTS: intervals containing zero indicate no significant difference, positive intervals favor Ariadne, and negative intervals favor AiZynthFinder MCTS\.Table S4:Additionalref\-lng\-84results\. Ariadne uses the 24\-layer checkpoint with target\-only prompting and beam size 50\. AiZynthFinder uses 100 MCTS iterations and maximum search depth 10\. Top\-10 CI gives the 95% bootstrap interval\. Root T10 measures recovery of the reference root reaction, and Route∣\\midroot T10 measures full\-route recovery among targets whose root reaction was recovered\. Paired CI rows compare Ariadne with AiZynthFinder MCTS: intervals containing zero indicate no significant difference, positive intervals favor Ariadne, and negative intervals favor AiZynthFinder MCTS\.Table S5:mkt\-lin\-500beam\-size sweep for the 24\-layer Ariadne checkpoint with target\-only prompting\. Top\-10 CI gives the 95% bootstrap interval\. Root T10 measures recovery of the reference root reaction, and Route∣\\midroot T10 measures full\-route recovery among targets whose root reaction was recovered\.
Similar Articles
ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection
Proposes ARIADNE, a training-free, adapter-agnostic routing framework that selects the optimal PEFT adapter at inference time by measuring input proximity to adapter-specific centroids in embedding space, recovering 97.44% of upper-bound performance on 23 tasks.
Bridging Chemists and AI: An Expert-Augmented Framework for Interpretable Route Evaluation
This paper introduces an expert-augmented, data-driven scoring framework that combines machine learning with chemists' domain knowledge to evaluate multi-step synthetic routes, achieving significant improvements in prediction accuracy over baselines.
Is anyone actually solving per-prompt model routing well yet, or are we all just eyeballing it?
The article explores the challenge of per-prompt model routing in AI agents, questioning whether anyone has effectively solved it. It points out that current practices rely on gut feeling, flat-rate plans reduce pressure to optimize, and a triage layer may introduce its own costs.
R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search
R-APS (Reflective Adversarial Pareto Search) is a novel method for constrained design tasks that addresses three structural failures in LLM-based agentic systems—error propagation, robustness evaluation, and knowledge invalidation—through reasoning-mode decomposition across three timescales, requiring no fine-tuning. Evaluated on planar mechanism synthesis, it achieves 3.5x tighter robustness certificates, 46% faster iterations-to-first-admission, and 2.1x Chamfer-distance reduction over baselines.
Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation
Derivation Prompting introduces a logic-inspired prompting method for Retrieval-Augmented Generation that constructs interpretable derivation trees, improving reasoning and reducing hallucinations in knowledge-intensive QA tasks.