Learning Local Constraints for Reinforcement-Learned Content Generators

arXiv cs.AI Papers

Summary

This paper proposes a hybrid method combining Wave Function Collapse (WFC) and reinforcement learning to generate game levels that are both visually satisfying and playable, using WFC constraints to guide the RL agent.

arXiv:2605.13570v1 Announce Type: new Abstract: Constraint-based game content generators that learn local constraints from existing content, such as Wave Function Collapse (WFC), can generate visually satisfying game levels but face challenges in guaranteeing global properties, such as playability. On the other hand, reinforcement-learning trained generators can guarantee global properties -- because such properties can easily be included in reward functions -- but the results can be visually dissatisfying. In this paper, we explore ways to combine these methods. Specifically, we constrain the action space of a PCGRL generator with constraints learned by WFC, effectively allowing the PCGRL generator to achieve global properties while forced to adhere to local constraints. To better analyze how this hybrid content generation method operates, we vary the number and type of inputs, and we test whether to randomly collapse the starting state and exclude rare patterns. While the method is sensitive to hyperparameter tuning, the best of our trained generators produce visually satisfying and playable puzzle-platform game levels -- such as Lode Runner levels -- with desired global properties.
Original Article
View Cached Full Text

Cached at: 05/14/26, 06:16 AM

# Learning Local Constraints for Reinforcement-Learned Content Generators
Source: [https://arxiv.org/html/2605.13570](https://arxiv.org/html/2605.13570)
###### Abstract

Constraint\-based game content generators that learn local constraints from existing content, such as Wave Function Collapse \(WFC\), can generate visually satisfying game levels but face challenges in guaranteeing global properties, such as playability\. On the other hand, reinforcement\-learning trained generators can guarantee global properties—because such properties can easily be included in reward functions—but the results can be visually dissatisfying\. In this paper, we explore ways to combine these methods\. Specifically, we constrain the action space of a PCGRL generator with constraints learned by WFC, effectively allowing the PCGRL generator to achieve global properties while forced to adhere to local constraints\. To better analyze how this hybrid content generation method operates, we vary the number and type of inputs, and we test whether to randomly collapse the starting state and exclude rare patterns\. While the method is sensitive to hyperparameter tuning, the best of our trained generators produce visually satisfying and playable puzzle\-platform game levels—such asLode Runnerlevels—with desired global properties\.

## IIntroduction

What makes a game level good? There are many factors to consider, but the most salient factors can arguably be divided into functional aspects and aspects of visual aesthetics\. Functional aspects relate to what the player can do in the level, e\.g\., can they finish it, which skills are needed, and which items can be reached\. Visual aesthetics are in themselves multifaceted, but typically, a game has a specific visual style, and levels that do not adhere to this style look broken\. While visual aesthetics relate to both global and local aspects of a level, functional aspects are generally global\. In particular, whether a level can be finished or an item reached can only be evaluated in the context of the whole level\.

Self\-supervised learning approaches to image generation are generally very good at capturing visual aesthetics, and this capability extends to level generation if enough data is available\[[40](https://arxiv.org/html/2605.13570#bib.bib8),[28](https://arxiv.org/html/2605.13570#bib.bib5),[35](https://arxiv.org/html/2605.13570#bib.bib4)\]\. However, they do not inherently capture the functional aspects of a game level, perhaps because this is not part of their learning signal\. Reinforcement learning approaches, on the other hand, can be used to learn level generation models that capture functionality well if the reward is made explicitly dependent on such functionality\[[15](https://arxiv.org/html/2605.13570#bib.bib25),[26](https://arxiv.org/html/2605.13570#bib.bib29),[7](https://arxiv.org/html/2605.13570#bib.bib26)\]\. Unfortunately, this often comes at the expense of visual aesthetics, as levels generated via reinforcement learning can be downright ugly \(figure[13](https://arxiv.org/html/2605.13570#S6.F13)\)\.

The question naturally arises whether we can combine self\-supervised and reinforcement learning methods to learn level generators that generate functional levels that adhere to specific visual styles\. This may or may not take the form of learning local patterns via self\-supervised learning and global structure via reinforcement learning\. This paper proposes one specific method for doing exactly this, combining the Wave Function Collapse \(WFC\)\[[9](https://arxiv.org/html/2605.13570#bib.bib31)\]algorithm for learning local patterns and reinforcement learning for learning to produce playable levels\. The specific way these methods are combined is by letting WFC limit what action the RL model can take\.

For game level generation, a generated level must be playable\. Often, levels generated using PCG via machine learning \(PCGML\)\[[36](https://arxiv.org/html/2605.13570#bib.bib2)\]look similar to the training human\-made levels, but do not guarantee functionality\. The obvious reason is that functionality does not depend on or is not related to the visual similarity or aesthetics\. For example, a Super Mario Bros level with some randomly scattered floor tiles in the sky may look messy, but the level is still functionally complete if there exists a path to the goal\. The other way also holds; a level that looks like it was designed by humans can be non\-functional if the path does not exist\. We are exploring how to generate playable levels that carry visual similarities with the given input using a reinforcement learning \(RL\) approach\. RL methods have shown great success in generating content, but incorporating the visual similarity measure in a reward function is not very straightforward\.

Our paper is novel in a number of ways\. First, we are combining WFC with PCGRL by constraining the action space of PCGRL using the local rules derived by WFC\. Second, we study the effects of the algorithm’s hyperparameters on the final generated content\. We experiment with the size of the input data to the WFC algorithm \(single input vs multiple inputs\)\. We also vary the diversity of the selected inputs to investigate how it influences the functionality and the diversity of the output levels\. Further, we explore the outcomes of the exclusion of the less frequent patterns of the input\. Finally, we test the effects of starting after collapsing a small number of cells compared to starting from completely uncollapsed levels\.

## IIBackground

This section covers related work within procedural content generation as performed via machine learning \(see Section[II\-A](https://arxiv.org/html/2605.13570#S2.SS1)\), RL \(see Section[II\-B](https://arxiv.org/html/2605.13570#S2.SS2)\), and WFC \(see Section[II\-C](https://arxiv.org/html/2605.13570#S2.SS3)\)\.

### II\-APCGML

Procedural Content Generation \(PCG\)\[[25](https://arxiv.org/html/2605.13570#bib.bib1)\]research focuses on the generation of game content \(such as maps, quests, levels, music, narrative, etc\) using input examples\. In this approach, a machine learning model is trained using the input data\. The model tries to learn the underlying distribution of the training data; afterwards, the trained model is used to generate new content\. Various machine learning approaches have been explored for automated content generation, which range from Markov models\[[28](https://arxiv.org/html/2605.13570#bib.bib5)\], LSTM networks\[[35](https://arxiv.org/html/2605.13570#bib.bib4)\], Generative Adversarial Networks GANs\)\[[40](https://arxiv.org/html/2605.13570#bib.bib8),[24](https://arxiv.org/html/2605.13570#bib.bib6)\], AutoEncoders\[[23](https://arxiv.org/html/2605.13570#bib.bib13)\], to recent Large Language Models \(LLMs\)\[[39](https://arxiv.org/html/2605.13570#bib.bib11),[19](https://arxiv.org/html/2605.13570#bib.bib10)\]\. In a different approach, the problem of content generation is viewed as an iterative process\. In place of generating the whole content in one go, this approach builds the content in iterations\. Path of Destruction\[[27](https://arxiv.org/html/2605.13570#bib.bib9)\], Diffusion Models\[[6](https://arxiv.org/html/2605.13570#bib.bib20)\], Neural Cellular Automata\[[33](https://arxiv.org/html/2605.13570#bib.bib12)\], etc\.

### II\-BPCG using RL

Reinforcement Learning \(RL\) based PCG methods\[[15](https://arxiv.org/html/2605.13570#bib.bib25)\]treat the level generation problem as a Markov Decision Process \(MDP\), where the agent is trained to select an action that leads towards a goal; in return, it receives a reward indicating how good or bad the action is\. This constitutes an iterative approach to the level generation problem, where levels are generated in a step\-by\-step manner rather than a one\-shot process\. One advantage of RL\-based methods compared to methods based on supervised or self\-supervised methods is that RL\-based methods do not require training data\. Instead, a reward function is used to guide the generation process, which can help the trained agent to learn more complex concepts such as playability\.

Khalifa et al\.\[[15](https://arxiv.org/html/2605.13570#bib.bib25)\]introduced an RL\-based framework for 2D game levels, where starting with a random level, the RL agent iteratively modifies the level towards a certain goal\. Earle et al\.\[[7](https://arxiv.org/html/2605.13570#bib.bib26)\]proposed a controllable RL\-based generator, where they used control parameters for training the agent\. At inference time, using the control parameter, users can generate a variety of content from a single generator\. Jiang et al\.\[[11](https://arxiv.org/html/2605.13570#bib.bib27)\]applied an RL\-based controllable generator on a more complex 3D game environment\. More recently, Gisslen et al\.\[[8](https://arxiv.org/html/2605.13570#bib.bib28)\]applied an adversarial RL approach for PCG\. They adversarially trained a PCGRL generator using an RL\-based solving agent for generating novel game environments\. In some other direction, Shu et al\.\[[26](https://arxiv.org/html/2605.13570#bib.bib29)\]combined PCGRL with experience\-driven PCG to generate personalized game content\.

![Refer to caption](https://arxiv.org/html/2605.13570v1/images/pcgrl+wfc_framework.jpg)Figure 1:System overview of the WCRL framework
### II\-CWave Function Collapse

Wave Function Collapse \(WFC\) was initially proposed by Maxim Gumin\[[9](https://arxiv.org/html/2605.13570#bib.bib31)\]for generating images and tile maps that have pattern similarities with given input\. The algorithm takes pixel or tile\-based example input, divides the input into NxN patterns, and extracts the local relations between these patterns, which define the constraints for the algorithm\. Following these constraints, the algorithm produces outputs having pattern similarities with the given input\. Since its inception, it has become popular among game designers as well as game researchers due to the aesthetically pleasing output and the need for a small amount of input data\. It has been applied and adapted in various games such as Bad North\[[31](https://arxiv.org/html/2605.13570#bib.bib33)\], Townscaper\[[38](https://arxiv.org/html/2605.13570#bib.bib32)\], Caves of Qud\[[4](https://arxiv.org/html/2605.13570#bib.bib34)\], etc\.

Several academic studies have explored WFC in different ways\. Karth et al\.\[[12](https://arxiv.org/html/2605.13570#bib.bib35)\]investigate the use of WFC as a constraint\-solving PCG approach\. In a follow\-up work\[[13](https://arxiv.org/html/2605.13570#bib.bib37),[14](https://arxiv.org/html/2605.13570#bib.bib36)\], they explore different ways to extend the algorithm and overcome its limitations, such as using VQ\-VAE as a tile representation, and using positive and negative examples as inputs, etc\. Sandhu et al\.\[[22](https://arxiv.org/html/2605.13570#bib.bib38)\]explore the idea of integrating design constraints as a general framing of WFC constraints and investigate their effectiveness\. Instead of using a grid structure, a graph structure can be used to expand on the functionality of the method and reduce its limitations\[[5](https://arxiv.org/html/2605.13570#bib.bib39),[16](https://arxiv.org/html/2605.13570#bib.bib40)\]\. In another study\[[20](https://arxiv.org/html/2605.13570#bib.bib41)\]applied WFC on a growing grid rather than a fixed\-sized grid to overcome the limitation of having a specific level size\. Langendam and Bidarra\[[17](https://arxiv.org/html/2605.13570#bib.bib42)\]proposed a mixed\-initiative PCG tool using WFC that allows easier interaction for artists and game level designers\. Moving from using a simple tile set, Alaka and Bidarra\[[1](https://arxiv.org/html/2605.13570#bib.bib43)\]explored semantics\-based hierarchical structure using meta\-tiles for an interactive design tool, so humans don’t need to worry about nitty\-gritty details and focus on the bigger picture\. Babin and Katchabaw\[[2](https://arxiv.org/html/2605.13570#bib.bib44)\]combined a reinforcement learning approach with WFC for generating playable Super Mario levels\. They applied an ES\-based optimization approach to train an RL agent that replaces the minimal entropy heuristic and action selection of WFC\.

![Refer to caption](https://arxiv.org/html/2605.13570v1/images/adjacency.jpg)Figure 2:Adjacency relation of the marked pattern using a 3x3 window\. The selected pattern is marked by a green border, and the 3x3 neighbor patterns in four cardinal directions are shown on the side\.
### II\-DLode Runner

Lode Runner is a platformer\-puzzle game, published by Broderbund in 1983\. The game is about collecting gold pieces without getting killed by the enemies\. The player can walk on the platforms, travel through ropes, and climb ladders to reach higher areas in the level, but they cannot jump\. Besides, traversing a level, the player can dig holes in bricks to make a new path or use the holes to trap/kill enemies\. Though Lode Runner is not a popular choice for game AI research, its spatial relations between different tiles and puzzle\-like nature make it a good candidate for our experiment, as it is easy to detect a level that doesn’t follow Lode Runner’s structure, and it has hard connectivity and functionality constraints that need to be achieved\.

Snodgrass and Ontanón\[[28](https://arxiv.org/html/2605.13570#bib.bib5)\]trained a multi\-dimensional Markov model to produce levels for Super Mario Bros, Lode Runner, and Kid Icarus\. Steckel et al\.\[[32](https://arxiv.org/html/2605.13570#bib.bib16)\]used a GAN with the MAP\-Elites algorithm to generate diverse playable levels of Lode Runner\. Sorochan et al\.\[[30](https://arxiv.org/html/2605.13570#bib.bib3)\]trained LSMT on the player path of Lode Runner levels and used it as a level generator\. Snodgrass and Sarkar\[[29](https://arxiv.org/html/2605.13570#bib.bib15)\]combined variational auto encoders with example\-driven binary space partitioning to blend and generate levels from multiple domains, including Lode Runner\. Thakkar et al\.\[[37](https://arxiv.org/html/2605.13570#bib.bib14)\]applied evolution on the latent space of autoencoders and variational autoencoders to generate Lode Runner levels\.

## IIIWCRL: Wave Collapse via Reinforcement Learning

In this paper, we integrate the constraint\-solving power of Wave Function Collapse with the PCGRL framework to generate visually pleasing and functional levels for the platformer game Lode Runner\. Inspired by the Babin et al\. work\[[2](https://arxiv.org/html/2605.13570#bib.bib44)\], we combine RL with WFC for generating playable levels where the RL selects the value of the next tile\. One difference is that Babin et al\. used an ES\-based optimization approach to train the RL agent, while we are using a PPO\-based RL agent\. Another difference is the domain itself; Babin et al\. focused on generating linear levels for Super Mario Bros while we are focused on the Puzzle game Lode Runner\. Lode Runner is a harder problem to solve compared to Mario, where playability is easier to be achieved\[[28](https://arxiv.org/html/2605.13570#bib.bib5)\]\. In the proposed framework, the quality and the visual aesthetics of the generated content can be affected by different factors\. We extend the study by investigating some of these factors, such as the size of the input data, the diversity of the input data, the presence and exclusion of less frequent patterns, and finally, training from a random collapsed starting state vs an empty state\.

Figure[1](https://arxiv.org/html/2605.13570#S2.F1)displays the overview of the proposed framework\. WFC operates on the tiles by treating them as pixels\. The framework takes input level\(s\) and extracts NxN tile patterns \(where N is the size of local constraints, usually 2 or 3 works the best\) present in the input level\(s\)\. These NxN unique patterns create the action space for the RL agent\. WFC finds the adjacency relations between the patterns\. Figure[2](https://arxiv.org/html/2605.13570#S2.F2)displays 3x3 adjacency relations for a selected pattern\. These adjacency relations tell what patterns can be placed as neighbors of a selected pattern in different directions\.

Algorithm 1Pseudo Code of a full level generation using the WCRL framework\. Theo​b​sobs,p​a​t​t​e​r​npattern, andr​e​w​a​r​drewardare used to train the RL agent\.1:

⊳\\trianglerightInitialize empty level with all possible patterns

2:

p​a​t​t​e​r​n​s←e​x​t​r​a​c​t​\_​p​a​t​t​e​r​n​s​\(i​n​p​u​t\)patterns\\leftarrow extract\\\_patterns\(input\)
3:

a​d​j​\_​r​u​l​e​s←f​i​n​d​\_​a​d​j​a​c​e​n​c​y​\_​r​u​l​e​s​\(p​a​t​t​e​r​n​s\)adj\\\_rules\\leftarrow find\\\_adjacency\\\_rules\(patterns\)
4:

l​v​l←e​m​p​t​y​\_​g​r​i​d​\(p​a​t​t​e​r​n​s\)lvl\\leftarrow empty\\\_grid\(patterns\)
5:

⊳\\trianglerightAssign a single player pattern to the level

6:

l​o​c←r​a​n​d​o​m​\(l​v​l\)loc\\leftarrow random\(lvl\)
7:

p​l​a​y​e​r​\_​p​a​t​t​e​r​n​s←g​e​t​\_​p​l​a​y​e​r​\_​p​a​t​t​e​r​n​s​\(p​a​t​t​e​r​n​s\)player\\\_patterns\\leftarrow get\\\_player\\\_patterns\(patterns\)
8:

p​a​t​t​e​r​n←r​a​n​d​o​m​\(p​l​a​y​e​r​\_​p​a​t​t​e​r​n​s\)pattern\\leftarrow random\(player\\\_patterns\)
9:

l​v​l←a​p​p​l​y​\_​p​a​t​t​e​r​n​\(l​o​c,p​a​t​t​e​r​n,l​v​l,a​d​j​\_​r​u​l​e​s\)lvl\\leftarrow apply\\\_pattern\(loc,pattern,lvl,adj\\\_rules\)
10:

r​e​m​o​v​e​\_​p​a​t​t​e​r​n​s​\(l​v​l,p​l​a​y​e​r​\_​p​a​t​t​e​r​n​s\)remove\\\_patterns\(lvl,player\\\_patterns\)
11:

⊳\\trianglerightCollapse the level using WCRL Framework

12:whilenot

c​e​l​l​\_​c​o​l​l​a​p​s​e​d​\(l​v​l\)cell\\\_collapsed\(lvl\)do

13:

l​o​c←n​e​x​t​\_​c​e​l​l​\_​t​o​\_​c​o​l​l​a​p​s​e​\(l​v​l\)loc\\leftarrow next\\\_cell\\\_to\\\_collapse\(lvl\)
14:

a​v​a​i​l​a​b​l​e​\_​p​a​t​t​e​r​n​s←g​e​t​\_​v​a​l​i​d​\_​p​a​t​t​e​r​n​s​\(l​o​c,l​v​l\)available\\\_patterns\\leftarrow get\\\_valid\\\_patterns\(loc,lvl\)
15:if

l​e​n​\(a​v​a​i​l​a​b​l​e​\_​p​a​t​t​e​r​n​s\)len\(available\\\_patterns\)is

0then

16:returncontradictions error

17:endif

18:

p​a​t​t​e​r​n←R​L​\_​a​g​e​n​t​\(l​o​c,l​v​l,a​v​a​i​l​a​b​l​e​\_​p​a​t​t​e​r​n​s\)pattern\\leftarrow RL\\\_agent\(loc,lvl,available\\\_patterns\)
19:

n​\_​l​v​l←a​p​p​l​y​\_​p​a​t​t​e​r​n​\(l​o​c,p​a​t​t​e​r​n,l​v​l,a​d​j​\_​r​u​l​e​s\)n\\\_lvl\\leftarrow apply\\\_pattern\(loc,pattern,lvl,adj\\\_rules\)
20:

r​e​w​a​r​d←R​L​\_​r​e​w​a​r​d​\(n​\_​l​v​l,l​v​l\)reward\\leftarrow RL\\\_reward\(n\\\_lvl,lvl\)
21:

l​v​l←n​\_​l​v​llvl\\leftarrow n\\\_lvl
22:endwhile

23:

⊳\\trianglerightReturn the fully collapsed level

24:return

l​v​llvl

Algorithm[1](https://arxiv.org/html/2605.13570#alg1)shows the main steps for running the algorithm from the start state till a level is generated\. At initialization, WFC extracts the patterns and adjacency rules from the input image\(s\) \(lines[2](https://arxiv.org/html/2605.13570#alg1.l2)and[3](https://arxiv.org/html/2605.13570#alg1.l3)\)\. We create an empty grid of the same size as the level, where each cell contains the possible patterns that can be placed at that location \(line[4](https://arxiv.org/html/2605.13570#alg1.l4)\)\. Initially, all patterns are available to be placed in any location\. WFC picks the most constrained tile \(i\.e\. the cell with the least number of available patterns \(line[13](https://arxiv.org/html/2605.13570#alg1.l13)\)\) and provides it, the current level, and available patterns to the RL agent, which selects one of the available patterns \(line[18](https://arxiv.org/html/2605.13570#alg1.l18)\)\. After the agent selects the pattern, WFC applies the pattern and propagates that selection to the whole level by removing any patterns that will conflict with the adjacency relation \(line[19](https://arxiv.org/html/2605.13570#alg1.l19)\)\. If at any point the most constrained cell does not have any more choices, WFC raises a contradiction error, which indicates failure to generate the level \(lines[15](https://arxiv.org/html/2605.13570#alg1.l15),[16](https://arxiv.org/html/2605.13570#alg1.l16), and[17](https://arxiv.org/html/2605.13570#alg1.l17)\)\. If the propagation is completed successfully, the framework calculates a reward signal that signifies how close that new level is to playability from the previous level \(line[20](https://arxiv.org/html/2605.13570#alg1.l20)\)\. This process continues until all the cells are collapsed or a contradiction occurs during the propagation \(line[12](https://arxiv.org/html/2605.13570#alg1.l12)\)\. Before the framework starts, we place a random player pattern \(lines[7](https://arxiv.org/html/2605.13570#alg1.l7)and[8](https://arxiv.org/html/2605.13570#alg1.l8)\) at a random location \(line[6](https://arxiv.org/html/2605.13570#alg1.l6)\) and propagate it through the level \(line[9](https://arxiv.org/html/2605.13570#alg1.l9)\), then remove all the patterns that could add an additional player \(line[10](https://arxiv.org/html/2605.13570#alg1.l10)\)\.

![Refer to caption](https://arxiv.org/html/2605.13570v1/images/single_level.png)\(a\)The input level for the Single Level experiment\.
![Refer to caption](https://arxiv.org/html/2605.13570v1/images/similar_levels.png)\(b\)The input levels for the low TP\-KLDiv experiment\.
![Refer to caption](https://arxiv.org/html/2605.13570v1/images/different_levels.png)\(c\)The input levels for the high TP\-KLDiv experiment\.

Figure 3:The different input levels for all the experiments where Level 22 is used in all of them and other levels are added based on the diversity of the levels, calculated using Tile Pattern KL\-Divergence \(TPKLDiv\) score\.![Refer to caption](https://arxiv.org/html/2605.13570v1/x1.jpg)\(a\)No Restriction
![Refer to caption](https://arxiv.org/html/2605.13570v1/x2.jpg)\(b\)Excluding Rare Patterns
![Refer to caption](https://arxiv.org/html/2605.13570v1/x3.jpg)\(c\)Random Collapsed Starting State
![Refer to caption](https://arxiv.org/html/2605.13570v1/x4.jpg)\(d\)Excluding Rare Patterns \+ Random Collapsed Starting State

Figure 4:Levels generated using single input frames![Refer to caption](https://arxiv.org/html/2605.13570v1/x5.jpg)\(a\)No Restriction
![Refer to caption](https://arxiv.org/html/2605.13570v1/x6.jpg)\(b\)Excluding Rare Patterns
![Refer to caption](https://arxiv.org/html/2605.13570v1/x7.jpg)\(c\)Random Collapsed Starting State
![Refer to caption](https://arxiv.org/html/2605.13570v1/x8.jpg)\(d\)Excluding Rare Patterns \+ Random Collapsed Starting State

Figure 5:Levels generated using multiple input frames\.![Refer to caption](https://arxiv.org/html/2605.13570v1/images/div_mi.jpg)\(a\)No Restriction
![Refer to caption](https://arxiv.org/html/2605.13570v1/images/div_mi_rr.jpg)\(b\)Excluding Rare Patterns
![Refer to caption](https://arxiv.org/html/2605.13570v1/images/div_mi_rc.jpg)\(c\)Random Collapsed Starting State
![Refer to caption](https://arxiv.org/html/2605.13570v1/images/div_mi_rr_rc.jpg)\(d\)Excluding Rare Patterns \+ Random Collapsed Starting State

Figure 6:Levels generated using highly diverse multiple input frames\.### III\-AWFC Rule Learning & Constraints

As discussed above, we use the full loop of the WFC, but we replace the tile selection strategy from the original WFC with the RL agent\. Instead of using the prior distribution of the tiles, we allow the RL agent to select the tile that will lead towards functionality\. WFC is mainly used to learn the adjacency constraints \(lines[2](https://arxiv.org/html/2605.13570#alg1.l2)and[3](https://arxiv.org/html/2605.13570#alg1.l3)\) and restrict the RL agent’s action space \(lines[14](https://arxiv.org/html/2605.13570#alg1.l14)and[18](https://arxiv.org/html/2605.13570#alg1.l18)\), such that the generated levels will have a similar look to the training input data\. When the RL agent selects a pattern, WFC will propagate that selection all over the map and remove any patterns that won’t work with that selection \(line[19](https://arxiv.org/html/2605.13570#alg1.l19)\)\. This will prevent the RL agent from selecting any invalid actions in the next step\.

### III\-BRL Observation

We are following a representation which is similar to the narrow representation from the PCGRL framework\[[15](https://arxiv.org/html/2605.13570#bib.bib25)\]\. At each step, a location and the current observation are given to the agent \(line[18](https://arxiv.org/html/2605.13570#alg1.l18)\)\. The observation is a 3D array of sizel×w×tl\\times w\\times twherellandwware the dimensions of the level andttis the total number of different tiles available\. If a cell is collapsed, the number of available tiles in that cell will be 1\. The current location is provided as a \(row, column\) index by the WFC\. We follow the same idea as the narrow representation and translate the input observation such that the location becomes the center of the observation\. This makes the transformed level twice the size of the actual level\. As player placement is not handled by the RL agent, the player channel is also not included in the observation space\. The player location is treated as an empty tile in the observation\. This leads the framework to learn to have a connected level rather than where the player starts from\.

### III\-CRL Action Space

The action space is defined by the patterns obtained from the WFC \(line[2](https://arxiv.org/html/2605.13570#alg1.l2)\); the number of actions ranges between 0 ton−1n\-1, wherennis the total number of patterns in the input level\. Since not all the patterns are available for the agent, we mask the output actions so that only the available actions can be selected from \(line[18](https://arxiv.org/html/2605.13570#alg1.l18)\)\. Similarly to Huang and Ontanón’s work\[[10](https://arxiv.org/html/2605.13570#bib.bib17)\]—where they have shown the effectiveness of invalid action masking over the non\-masked actions for Super Mario Bros level generation—we are masking the actions coming out from the RL agent\. At each step, WFC generates a list of available patterns for the location to be modified \(line[14](https://arxiv.org/html/2605.13570#alg1.l14)\)\. The available patterns information is converted into masked action and sent to the agent\. The masked\-action is an array of sizenn, matching the number of patterns; for available patterns the value is set to 1 and 0 otherwise\. Using masking not only prevents the agent from selecting invalid actions but also prevents gradients from updating the network based on the invalid action output\.

### III\-DRL Reward

The reward is calculated using an automated game\-playing agent, which follows a simplified version of the game mechanics\. The agent tries to find the number of gold reachable from the player’s location using a flood\-fill algorithm that follows the game mechanics\. To keep the measure simple and quick, we did not include the digging ability or automate enemy movements\. The reward function encourages the playability of the level and the number of reachable golds from the player’s location\. If the selected action improves the playability by making the golds reachable from the player’s position, a positive reward is given\. Decreasing the connectivity leads to a negative reward, indicating a bad action \(line[20](https://arxiv.org/html/2605.13570#alg1.l20)\)\. If the selected action results in any contradiction \(line[16](https://arxiv.org/html/2605.13570#alg1.l16)\) during the propagation process of WFC, a big negative reward is given\. This helps the agent to learn to always learn to take actions that will not lead to contradiction\. In this framework, we have two termination conditions: the level is fully collapsed \(line[24](https://arxiv.org/html/2605.13570#alg1.l24)\), or there is a contradiction in the propagation process of WFC \(line[16](https://arxiv.org/html/2605.13570#alg1.l16)\)\.

## IVExperiments

We test all the different input parameters of the framework to understand their effect on the final output\. We focus on 3 main parameters: the input levels, the learned patterns, and finally the starting state\. In the following subsections, we will discuss the different experiments related to them\.

### IV\-AInput levels

In this framework, the visual aesthetic of the generated level is dependent on the input level\. To explore how the input level influences the generated level, we have used a single input level as well as multiple input levels\. For our experiments, we used Lode Runner levels from the Video Games Level Corpus \(VGLC\)\[[34](https://arxiv.org/html/2605.13570#bib.bib7)\]\. Additionally, for multiple input levels, the diversity of the input levels also affects the output levels\. Therefore, we picked two different sets, one set having minimum diversity \(i\.e\. containing levels that look similar to each other\) and the other having high diversity \(i\.e\. containing levels that look very different from each other\)\. Diversity is calculated using the average tile\-pattern KL\-divergence \(TPKLDiv\) score\[[18](https://arxiv.org/html/2605.13570#bib.bib18)\]between all levels in pairs\. The selected levels can be seen in figure[3](https://arxiv.org/html/2605.13570#S3.F3)\. To extend WFC to work with multiple inputs, WFC extractsN×NN\\times Nunique tile patterns and the adjacency relations of each input separately\. These patterns and the adjacency rules are then combined to create the final tile\-pattern dataset and adjacency constraints\.

### IV\-BLearned Patterns

While constructing the pattern dataset from input level\(s\), we found that some patterns have higher occurrences, whereas some patterns have very low occurrences in the input level\. The patterns, which appear only once in the input level \(i\.e\. named asrare\), are low\-frequency patterns that usually push the level to collapse a smaller set of options, which leaves the RL agent with not many options to select from\. The inclusion \(or exclusion\) of such rare patterns is one hyperparameter that we consider for the experiment\. To exclude rare patterns, we discard all patterns that have a single occurrence in the dataset\. But, in single\-input experiments, removing rare patterns with player tiles failed to propagate the player placement, as those are the only patterns that can be placed at the neighboring cells of the player tile\. To handle this, for single\-input experiments without rare patterns, the patterns having player tiles are not excluded from the pattern dataset\.

![Refer to caption](https://arxiv.org/html/2605.13570v1/images/playability.jpg)Figure 7:Compares the playability of levels generated using different conditions\- SI: single input, MI: multiple inputs, Div\-MI: highly diverse multiple input, RR: excluding rare patterns, RC: random collapsed starting state for training
### IV\-CStarting State

Our generation process starts with an empty level; sometimes this makes the agent learn a narrow range of levels that are similar to each other\. To allow the RL to explore different states and learn more generalizable policies, we start from a random, partially collapsed state instead of an empty one\. Where some parts of the level are already set using normal WFC, and the agent has to continue building the level on top of it\. This random collapse is only done during the training phase; it is not applied during inference\.

### IV\-DSetup

Our framework is implemented as an OpenAI Gym interface\[[3](https://arxiv.org/html/2605.13570#bib.bib45)\]\. For our experiments, we use Lode Runner level generation as the problem, where the goal is to generate playable Lode Runner levels of size32×2232\\times 22\. For the input levels, we use Lode Runner levels from the Video Game Level Corpus \(VGLC\)\[[34](https://arxiv.org/html/2605.13570#bib.bib7)\], with a3×33\\times 3window as the pattern size\. For training, we used Maskable Proximal Policy Optimization \(Maskable PPO\), a variation of Proximal Policy Optimization \(PPO\) from Stable\-Baselines3\-contrib\[[21](https://arxiv.org/html/2605.13570#bib.bib30)\]\. Our policy uses the same body for both the action and value heads, and it is made of 3 convolution layers followed by 2 fully\-connected layers\. We ran an ablation study varying the above\-mentioned factors, which made a total of 12 experiments\. Each of our experiments runs for 5 million timestamps\. We trained 5 different models for each experiment to show the stability of the training process\. We denote the different experiment settings as single input by ‘SI’, multiple input by ‘MI’, highly diverse multiple input as ‘div\-MI’, excluding rare patterns as ‘RR’, and random starting state as ‘RC’\. For the single input, we used a traditional\-looking Lode Runner level for that, while for multiple input, we selected 2 additional levels such that the subset has the smallest TP\-KLDiv or the highest TP\-KlDiv, which can be seen in figure[3](https://arxiv.org/html/2605.13570#S3.F3)\.

![Refer to caption](https://arxiv.org/html/2605.13570v1/images/contradiction.jpg)Figure 8:Unplayability across different experimental conditions: single input \(SI\), multiple inputs \(MI\), highly diverse multiple input \(Div\-MI\), excluding rare patterns \(RR\), and random collapsed starting state for training \(RC\)\.

## VResults

We generate 100 levels using each trained model and compare the playability and diversity of the generated levels from different models\. Figures[4](https://arxiv.org/html/2605.13570#S3.F4),[5](https://arxiv.org/html/2605.13570#S3.F5), and[6](https://arxiv.org/html/2605.13570#S3.F6)display playable levels generated using single, multiple, and diverse\-multiple inputs, respectively\. We can notice that the generated levels follow a similar structure to the input levels, with high similarity between all the generated levels except for the models trained on diverse inputs\. Models trained using diverse inputs preferred to stick to a specific style and continue the generation\. For example, certain levels have a huge amount of solid tiles, or others have long ropes\. The trained high diversity model failed to combine these different styles together; we believe that might be due to the adjacency constraints from one level usually not leading to another level\.

We used our automated playing agent to measure the playability of the generated levels\. Figure[7](https://arxiv.org/html/2605.13570#S4.F7)compares the percentage of playable levels from different experiments\. The playability comparison shows that single and multiple input experiments overall generated a higher amount of playable levels compared to the highly diverse multiple input experiments\. The lowest performing experimental setup \(div\-MI\+RR\+RC\) employs multiple and diverse inputs, removing the rare patterns, and it is trained from a random starting state\. This finding is expected, as this experiment is the most constrained during training and the hardest to solve\. To analyze unplayable levels further, we compare the number of such levels from different experiments\. Figure[8](https://arxiv.org/html/2605.13570#S4.F8)shows the comparison of unplayable levels either due to a contradiction or due to failure in functionality for the dissimilar experimental setups\. The graph clearly shows that div\-MI\+RR\+RC has the highest number of contradictions instead of generating unplayable levels, which showcases that it did not learn to avoid contradictions easily\.

![Refer to caption](https://arxiv.org/html/2605.13570v1/images/si_collapsed_cell_log.jpg)\(a\)Single Input
![Refer to caption](https://arxiv.org/html/2605.13570v1/images/mi_collapsed_cell_log.jpg)\(b\)Multiple Similar Input
![Refer to caption](https://arxiv.org/html/2605.13570v1/images/div_mi_collapsed_cell_log.jpg)\(c\)Multiple Diverse Input

Figure 9:Average number of cells collapsed at every timestep\.![Refer to caption](https://arxiv.org/html/2605.13570v1/images/no_patterns_comp.jpg)Figure 10:Average number of patterns available for the whole map during training across experimental settings\.Here, we analyze further how diverse input types affect the training of our method\. Figure[9](https://arxiv.org/html/2605.13570#S5.F9)shows the number of collapsed steps during training\. The results show that diverse and multiple input experiments with removing rare patterns—regardless of the random collapse—have a relatively higher number of cells collapsed during initial steps, compared to all other training setups\. A large number of collapsed cells at the beginning indicates that the pattern selected at the first step influences the generated level largely; a large number of cells are decided based on the first placed pattern\. This leads to relatively small options for the rest of the generation process to play with and make the level playable\. We believe that this is due to the high diversity between the input levels, which makes the adjacency relations of patterns very restrictive\. Removing rare patterns makes the action space more restricted \(see figure[10](https://arxiv.org/html/2605.13570#S5.F10)\), which makes it difficult for the agent to create playable levels\. When the same setup is combined with a random starting state \(div\-MI\+RR\+RC\), it results in a contradiction in the WFC propagation\.

To understand more about the generated levels and how diverse they are from each other, we compare the diversity of the generated levels\. We use TP\-KLDiv as a diversity metric on the playable levels using a3×33\\times 3window\. Figure[11](https://arxiv.org/html/2605.13570#S5.F11)displays the diversity of the playable levels across different experiments\. An overall noticeable trend is that levels generated using multiple input frames have higher diversity values compared to the single input experiments\. This is the effect of the larger action space\. Multiple inputs give more options to choose from, which helps the algorithm to generate levels different from each other, as seen from the examples in Figures[4](https://arxiv.org/html/2605.13570#S3.F4),[5](https://arxiv.org/html/2605.13570#S3.F5), and[6](https://arxiv.org/html/2605.13570#S3.F6)\. We compare the number of available patterns for different experiments in figure[12](https://arxiv.org/html/2605.13570#S5.F12)\. The graphs show that the number of patterns available for single\-input experiments is comparatively lower than for multiple and diverse multiple\-input ones\. Looking at the diversity of multiple inputs and diverse multiple inputs, it becomes obvious that diverse and multiple inputs yield higher diversity than multiple inputs\. Observing the generated levels gives a similar impression about the diversity of the levels\. Levels generated using multiple and diverse inputs have different visual structures, such as long platforms and dense walls, compared to the single\-input levels\.

![Refer to caption](https://arxiv.org/html/2605.13570v1/images/diversity.jpg)Figure 11:Diversity of playable levels generated using different conditions: single input \(SI\), multiple inputs \(MI\), highly diverse multiple input \(Div\-MI\), excluding rare patterns \(RR\), random collapsed starting state for training \(RC\)\.![Refer to caption](https://arxiv.org/html/2605.13570v1/images/no_patterns.jpg)Figure 12:Number of patterns across different experimental settingsAnother interesting observation is that for each of the single, multiple, diverse multiple input experiments, diversity is higher when rare patterns are included, and the diversity decreases when rare patterns are excluded from the dataset\. Removing the rare patterns leads to a decrease in the size of the action space, as shown in figure[12](https://arxiv.org/html/2605.13570#S5.F12)\. Therefore, the agent is left with a smaller number of options and tends to repeat similar actions\. This trend is also visible in the generated levels\. In figure[4](https://arxiv.org/html/2605.13570#S3.F4), we can see repetition of patterns for experiments without rare patterns\. A similar trend is visible for both multiple input \(see figure[5](https://arxiv.org/html/2605.13570#S3.F5)\) and diverse multiple input \(see figure[6](https://arxiv.org/html/2605.13570#S3.F6)\) experiments as well\. The exclusion of rare patterns reduces variation and increases the amount of long connected platforms, which, in turn, reduces the diversity compared to those experiments that include rare patterns\. On the other hand, removing rare patterns helps multiple inputs to increase playability overall, as shown in figure[7](https://arxiv.org/html/2605.13570#S4.F7), while this is not the case with a single input image\. We believe that using multiple inputs produces a huge number of input patterns\. This might require longer training time to understand which patterns are better than others\. Reducing the space by removing rare patterns helps the agent to focus on the important actions\. But, removing the rare patterns has a different effect with single inputs as it restricts the space too much to build levels\.

## VIConclusion

In this work, we explored combining Wave Function Collapse with PCGRL in order to gain the advantages of each method similar to Babin et al\. work\[[2](https://arxiv.org/html/2605.13570#bib.bib44)\]\. The output framework was able to combine both their features compared to only using one of the methods alone\. WFC managed to constrain the space for the PPO\-based PCGRL agent to ensure the generated levels have a similar pattern distribution compared to levels generated by PCGRL only \(see figure[13](https://arxiv.org/html/2605.13570#S6.F13)\)\. While PCGRL managed to increase the playability of the generated content compared to the basic WFC\.

We extended Babin et al\. work\[[2](https://arxiv.org/html/2605.13570#bib.bib44)\]by experimenting with the different inputs for the algorithm and exploring their effects\. We looked into three different hyper parameters: input data, learned patterns, and starting state\.

For input data, we looked into having either a single input or multiple inputs\. We also looked on the effect of the diversity between these inputs\. We found out that using more than one input level helps to increase the playability overall, as long as these levels are similar to each other in structure\. On the other hand, having diverse inputs increases the diversity of the generated levels but leads to less playable levels as the framework faces challenges in finding connections between different types of patterns\. We argue that with a smoother gradient between diverse levels–such as levels that combine both styles–our method would yield even better results\.

For the learned patterns, we looked into removing rare patterns \(patterns that only appear once\)\. Removing rare patterns leads to a decrease in the diversity of the generated levels, due to the smaller number of available patterns\. But it helps increasing number of playable levels\. We believe that removing rare patterns helped the framework to focus on the most common patterns that usually lead to fully connected levels, rather than having these unique patterns that appear rarely in the input levels\.

![Refer to caption](https://arxiv.org/html/2605.13570v1/images/pcgrl.jpg)Figure 13:Levels generated by PCGRL agentFinally for the starting state, we tested starting from empty level or partially collapsed level\. Although starting from partially collapse state didn’t show much difference in playability and little improvement in diversity, especially with diverse input\. The trained models are more robust to the starting state and can actually find playable levels more easily when not starting from an empty state\. We believe random collapse is a key feature to have more generic policies that can work between different games and will help in transfer learning\.

The choice of Lode Runner as a research test bed was successful due to its large level space and complex mechanics; traditional methods fail to generate playable levels\[[28](https://arxiv.org/html/2605.13570#bib.bib5)\]that look like human\-designed ones \(figure[13](https://arxiv.org/html/2605.13570#S6.F13)\)\. Also, due to the repeated structure and need for connectivity, it is easy to notice issues with generated levels subjectively compared to other platformers such as Super Mario Bros \(Nintendo, 1985\)\. We believe more research should focus on using Lode Runner as its test bed\.

## References

- \[1\]S\. Alaka and R\. Bidarra\(2023\)Hierarchical semantic wave function collapse\.InProceedings of the 18th International Conference on the Foundations of Digital Games,External Links:[Document](https://dx.doi.org/10.1145/3582437.3587209)Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1)\.
- \[2\]M\. Babin and M\. Katchabaw\(2021\)Leveraging reinforcement learning and wavefunctioncollapse for improved procedural level generation\.InProceedings of the 16th International Conference on the Foundations of Digital Games,FDG ’21\.Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1),[§III](https://arxiv.org/html/2605.13570#S3.p1.1),[§VI](https://arxiv.org/html/2605.13570#S6.p1.1),[§VI](https://arxiv.org/html/2605.13570#S6.p2.1)\.
- \[3\]G\. Brockman, V\. Cheung, L\. Pettersson, J\. Schneider, J\. Schulman, J\. Tang, and W\. Zaremba\(2016\)OpenAI gym\.External Links:arXiv:1606\.01540Cited by:[§IV\-D](https://arxiv.org/html/2605.13570#S4.SS4.p1.2)\.
- \[4\]B\. Bucklew\(2022\)Tile\-based map generation using wave function collapse in ’caves of qud’\.Note:https://www\.youtube\.com/watch?v=AdCgi9E90jwCited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p1.1)\.
- \[5\]S\. Cooper\(2023\)Sturgeon\-graph: constrained graph generation from examples\.InProceedings of the 18th International Conference on the Foundations of Digital Games,External Links:[Document](https://dx.doi.org/10.1145/3582437.3582465)Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1)\.
- \[6\]S\. Dai, X\. Zhu, N\. Li, T\. Dai, and Z\. Wang\(2024\)Procedural level generation with diffusion models from a single example\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.38,pp\. 10021–10029\.Cited by:[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.
- \[7\]S\. Earle, M\. Edwards, A\. Khalifa, P\. Bontrager, and J\. Togelius\(2021\)Learning controllable content generators\.In2021 IEEE Conference on Games \(CoG\),External Links:[Document](https://dx.doi.org/10.1109/CoG52621.2021.9619159)Cited by:[§I](https://arxiv.org/html/2605.13570#S1.p2.1),[§II\-B](https://arxiv.org/html/2605.13570#S2.SS2.p2.1)\.
- \[8\]L\. Gisslén, A\. Eakins, C\. Gordillo, J\. Bergdahl, and K\. Tollmar\(2021\)Adversarial reinforcement learning for procedural content generation\.In2021 IEEE Conference on Games \(CoG\),External Links:[Document](https://dx.doi.org/10.1109/CoG52621.2021.9619053)Cited by:[§II\-B](https://arxiv.org/html/2605.13570#S2.SS2.p2.1)\.
- \[9\]M\. Gumin\(2016\)Wave function collapse\.Note:https://github\.com/mxgmn/WaveFunctionCollapseCited by:[§I](https://arxiv.org/html/2605.13570#S1.p3.1),[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p1.1)\.
- \[10\]S\. Huang and S\. Ontañón\(2022\)A closer look at invalid action masking in policy gradient algorithms\.The International FLAIRS Conference Proceedings\.External Links:[Document](https://dx.doi.org/10.32473/flairs.v35i.130584)Cited by:[§III\-C](https://arxiv.org/html/2605.13570#S3.SS3.p1.3)\.
- \[11\]Z\. Jiang, S\. Earle, M\. Green, and J\. Togelius\(2022\)Learning controllable 3d level generators\.InProceedings of the 17th International Conference on the Foundations of Digital Games,External Links:[Document](https://dx.doi.org/10.1145/3555858.3563273)Cited by:[§II\-B](https://arxiv.org/html/2605.13570#S2.SS2.p2.1)\.
- \[12\]I\. Karth and A\. M\. Smith\(2017\)WaveFunctionCollapse is constraint solving in the wild\.InProceedings of the 12th International Conference on the Foundations of Digital Games,External Links:[Document](https://dx.doi.org/10.1145/3102071.3110566)Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1)\.
- \[13\]I\. Karth and A\. M\. Smith\(2019\)Addressing the fundamental tension of pcgml with discriminative learning\.InProceedings of the 14th International Conference on the Foundations of Digital Games,External Links:[Document](https://dx.doi.org/10.1145/3337722.3341845)Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1)\.
- \[14\]I\. Karth and A\. M\. Smith\(2022\)WaveFunctionCollapse: content generation via constraint solving and machine learning\.IEEE Transactions on Games\.External Links:[Document](https://dx.doi.org/10.1109/TG.2021.3076368)Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1)\.
- \[15\]A\. Khalifa, P\. Bontrager, S\. Earle, and J\. Togelius\(2020\)PCGRL: procedural content generation via reinforcement learning\.InProceedings of the Sixteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment,AIIDE’20\.Cited by:[§I](https://arxiv.org/html/2605.13570#S1.p2.1),[§II\-B](https://arxiv.org/html/2605.13570#S2.SS2.p1.1),[§II\-B](https://arxiv.org/html/2605.13570#S2.SS2.p2.1),[§III\-B](https://arxiv.org/html/2605.13570#S3.SS2.p1.4)\.
- \[16\]H\. Kim, S\. Lee, H\. Lee, T\. Hahn, and S\. Kang\(2019\)Automatic generation of game content using a graph\-based wave function collapse algorithm\.In2019 IEEE Conference on Games \(CoG\),External Links:[Document](https://dx.doi.org/10.1109/CIG.2019.8848019)Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1)\.
- \[17\]T\. S\. L\. Langendam and R\. Bidarra\(2022\)MiWFC \- designer empowerment through mixed\-initiative wave function collapse\.InProceedings of the 17th International Conference on the Foundations of Digital Games,External Links:[Document](https://dx.doi.org/10.1145/3555858.3563266)Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1)\.
- \[18\]S\. M\. Lucas and V\. Volz\(2019\)Tile pattern kl\-divergence for analysing and evolving game levels\.InProceedings of the Genetic and Evolutionary Computation Conference,pp\. 170–178\.Cited by:[§IV\-A](https://arxiv.org/html/2605.13570#S4.SS1.p1.1)\.
- \[19\]M\. U\. Nasir and J\. Togelius\(2023\)Practical pcg through large language models\.In2023 IEEE Conference on Games \(CoG\),pp\. 1–4\.External Links:[Document](https://dx.doi.org/10.1109/CoG57401.2023.10333197)Cited by:[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.
- \[20\]T\. Nordvig Møller, J\. Billeskov, and G\. Palamas\(2020\)Expanding wave function collapse with growing grids for procedural map generation\.InProceedings of the 15th International Conference on the Foundations of Digital Games,External Links:[Document](https://dx.doi.org/10.1145/3402942.3402987)Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1)\.
- \[21\]A\. Raffin, A\. Hill, A\. Gleave, A\. Kanervisto, M\. Ernestus, and N\. Dormann\(2021\)Stable\-baselines3: reliable reinforcement learning implementations\.Journal of Machine Learning Research22\(268\),pp\. 1–8\.External Links:[Link](http://jmlr.org/papers/v22/20-1364.html)Cited by:[§IV\-D](https://arxiv.org/html/2605.13570#S4.SS4.p1.2)\.
- \[22\]A\. Sandhu, Z\. Chen, and J\. McCoy\(2019\)Enhancing wave function collapse with design\-level constraints\.InProceedings of the 14th International Conference on the Foundations of Digital Games,External Links:[Document](https://dx.doi.org/10.1145/3337722.3337752)Cited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p2.1)\.
- \[23\]A\. Sarkar, Z\. Yang, and S\. Cooper\(2020\)Controllable level blending between games using variational autoencoders\.arXiv preprint arXiv:2002\.11869\.Cited by:[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.
- \[24\]J\. Schrum, J\. Gutierrez, V\. Volz, J\. Liu, S\. Lucas, and S\. Risi\(2020\)Interactive evolution and exploration within latent level\-design space of generative adversarial networks\.InProceedings of the 2020 Genetic and Evolutionary Computation Conference,pp\. 148–156\.Cited by:[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.
- \[25\]N\. Shaker, J\. Togelius, and M\. J\. Nelson\(2016\)Procedural content generation in games: a textbook and an overview of current research\.Springer\.External Links:[Document](https://dx.doi.org/10.1007/978-3-319-42716-4)Cited by:[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.
- \[26\]T\. Shu, J\. Liu, and G\. N\. Yannakakis\(2021\)Experience\-driven pcg via reinforcement learning: a super mario bros study\.In2021 IEEE Conference on Games \(CoG\),External Links:[Document](https://dx.doi.org/10.1109/CoG52621.2021.9619124)Cited by:[§I](https://arxiv.org/html/2605.13570#S1.p2.1),[§II\-B](https://arxiv.org/html/2605.13570#S2.SS2.p2.1)\.
- \[27\]M\. Siper, A\. Khalifa, and J\. Togelius\(2022\)Path of destruction: learning an iterative level generator using a small dataset\.In2022 IEEE Symposium Series on Computational Intelligence \(SSCI\),pp\. 337–343\.Cited by:[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.
- \[28\]S\. Snodgrass and S\. Ontanón\(2016\)Learning to generate video game maps using markov models\.IEEE transactions on computational intelligence and AI in games9\(4\),pp\. 410–422\.Cited by:[§I](https://arxiv.org/html/2605.13570#S1.p2.1),[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1),[§II\-D](https://arxiv.org/html/2605.13570#S2.SS4.p2.1),[§III](https://arxiv.org/html/2605.13570#S3.p1.1),[§VI](https://arxiv.org/html/2605.13570#S6.p6.1)\.
- \[29\]S\. Snodgrass and A\. Sarkar\(2020\)Multi\-domain level generation and blending with sketches via example\-driven bsp and variational autoencoders\.InFoundations of Digital Games,Cited by:[§II\-D](https://arxiv.org/html/2605.13570#S2.SS4.p2.1)\.
- \[30\]K\. Sorochan, J\. Chen, Y\. Yu, and M\. Guzdial\(2021\)Generating lode runner levels by learning player paths with lstms\.InProceedings of the 16th International Conference on the Foundations of Digital Games,External Links:[Document](https://dx.doi.org/10.1145/3472538.3472602)Cited by:[§II\-D](https://arxiv.org/html/2605.13570#S2.SS4.p2.1)\.
- \[31\]O\. Stalberg\(2018\)Wave function collapse in bad north\.Note:https://www\.youtube\.com/watch?v=0bcZb\-SsnrACited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p1.1)\.
- \[32\]K\. Steckel and J\. Schrum\(2021\)Illuminating the space of beatable lode runner levels produced by various generative adversarial networks\.InGECCO,Cited by:[§II\-D](https://arxiv.org/html/2605.13570#S2.SS4.p2.1)\.
- \[33\]S\. Sudhakaran, D\. Grbic, S\. Li, A\. Katona, E\. Najarro, C\. Glanois, and S\. Risi\(2021\)Growing 3d artefacts and functional machines with neural cellular automata\.ArXiv\.Cited by:[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.
- \[34\]A\. J\. Summerville, S\. Snodgrass, M\. Mateas, and S\. Ontanón\(2016\)The vglc: the video game level corpus\.arXiv preprint arXiv:1606\.07487\.Cited by:[§IV\-A](https://arxiv.org/html/2605.13570#S4.SS1.p1.1),[§IV\-D](https://arxiv.org/html/2605.13570#S4.SS4.p1.2)\.
- \[35\]A\. Summerville and M\. Mateas\(2016\)Super mario as a string: platformer level generation via lstms\.arXiv preprint arXiv:1603\.00930\.Cited by:[§I](https://arxiv.org/html/2605.13570#S1.p2.1),[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.
- \[36\]A\. Summerville, S\. Snodgrass, M\. Guzdial, C\. Holmgård, A\. K\. Hoover, A\. Isaksen, A\. Nealen, and J\. Togelius\(2018\)Procedural content generation via machine learning \(pcgml\)\.IEEE Transactions on Games10\(3\),pp\. 257–270\.Cited by:[§I](https://arxiv.org/html/2605.13570#S1.p4.1)\.
- \[37\]S\. Thakkar, C\. Cao, L\. Wang, T\. J\. Choi, and J\. Togelius\(2019\)Autoencoder and evolutionary algorithm for level generation in lode runner\.InConference on Games,Cited by:[§II\-D](https://arxiv.org/html/2605.13570#S2.SS4.p2.1)\.
- \[38\]T\. Thompson\(2022\)How townscaper works: a story four games in the making\.Note:https://www\.youtube\.com/watch?v=\_1fvJ5sHh6ACited by:[§II\-C](https://arxiv.org/html/2605.13570#S2.SS3.p1.1)\.
- \[39\]G\. Todd, S\. Earle, M\. U\. Nasir, M\. C\. Green, and J\. Togelius\(2023\)Level generation through large language models\.InProceedings of the 18th International Conference on the Foundations of Digital Games,FDG ’23\.External Links:[Document](https://dx.doi.org/10.1145/3582437.3587211)Cited by:[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.
- \[40\]V\. Volz, J\. Schrum, J\. Liu, S\. M\. Lucas, A\. Smith, and S\. Risi\(2018\)Evolving mario levels in the latent space of a deep convolutional generative adversarial network\.InProceedings of the genetic and evolutionary computation conference,pp\. 221–228\.Cited by:[§I](https://arxiv.org/html/2605.13570#S1.p2.1),[§II\-A](https://arxiv.org/html/2605.13570#S2.SS1.p1.1)\.

Similar Articles

Learning Visual Feature-Based World Models via Residual Latent Action

Hugging Face Daily Papers

This paper introduces RLA-WM, a visual feature-based world model that leverages residual latent actions and flow matching to efficiently predict future visual states. The method outperforms existing video-diffusion and feature-based approaches while enabling novel robot learning techniques from offline, actionless demonstration videos.

Learning Agentic Policy from Action Guidance

arXiv cs.CL

The paper proposes ActGuide-RL, a method for training agentic policies in LLMs by using human action data as guidance to overcome exploration barriers in reinforcement learning without extensive supervised fine-tuning.