Diffusion for generating/editing ASTs? [D]

Reddit r/MachineLearning 05/07/26, 05:46 PM News

Summary

A user proposes using diffusion models to generate or edit Abstract Syntax Trees (ASTs) to ensure syntactic correctness in code generation, contrasting this with the token-based limitations of current LLMs.

I’m not a machine learning expert or anything, but I do enjoy learning about how it all works. I’ve noticed that one of the main limitations of LLMs for generating code is that their input and output space is the space of all tokens in the training data. This means that it is entirely possible, and likely, for an LLM to generate code that isn’t even syntactically correct. I’m thinking it would be possible to create some architecture, (diffusion could be a good paradigm) where an abstract syntax tree is generated or edited in a way which guarantees syntactic correctness at each iteration. Maybe then, a model meant to solve logical problems by generating a procedure could be effective with much less (or zero) training data. I think this could work with diffusion because I know that there is a limited number of ASTs for any given instruction set with a fixed number of nodes, the job of the algorithm is just to search that space for the best options, similar to how image gen models search their image spaces to match the given description. What do you all think? Also, forgive me if this is the wrong sub to put this in, I haven’t been very active on Reddit until recently.

Original Article

Diffusion for generating/editing ASTs? [D]

Similar Articles

Diffusion Language Models: An Experimental Analysis

Bulding my own Diffusion Language Model from scratch was easier than I thought [P]

Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

Self-Generated Error Training for Token Editing in Diffusion Language Models

Submit Feedback

Similar Articles

Diffusion Language Models: An Experimental Analysis

Bulding my own Diffusion Language Model from scratch was easier than I thought [P]

Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

Self-Generated Error Training for Token Editing in Diffusion Language Models