Scenario Generation for Testing of Autonomous Driving Systems Using Real-World Failure Records

arXiv cs.AI Papers

Summary

This paper proposes a modular LLM-based pipeline that generates diverse test scenarios for autonomous driving systems using historical failure records (e.g., NHTSA crash data), enabling effective failure discovery within limited testing budgets.

arXiv:2606.31131v1 Announce Type: new Abstract: To ensure safe on-road behavior, pre-deployment testing and failure discovery of Autonomous Driving Systems (ADS) is crucial. Present day simulation based testing methods focus largely on mathematical models for efficient search of optimal scenarios, assuming a fixed scenario representation. On the other hand, real-world testing involves substantial manual effort to design scenario templates for testing. These templates represent distinct failure scenarios consisting of pre-deployment vehicle movements, map types, etc. Historical failure records for ADS are a reliable source of real-world failure conditions, which can be used for scenario generation. In this work, we propose a scenario generation pipeline using categorical and contextual information available from historical records in natural language format. Our approach consists of modular LLM based synthetic scenario generation, compatible with the testing constraints of a given system. We successfully apply our method to generate a diverse set of scenarios for testing autonomous navigation on Metadrive simulator using the NHTSA ADS crash records. Our approach results in accurate and diverse scenario generation with a combination of 4 road types, 3 non ego vehicle movement types, including on road anomalies in the form of working zones. Generated scenarios align with the provided testing conditions, and reveals interesting failures of the system within a limited testing budget of 20 scenarios. Code is available at https://github.com/anjaliParashar/crash2scenario.
Original Article
View Cached Full Text

Cached at: 07/01/26, 05:37 AM

# Scenario Generation for Testing of Autonomous Driving Systems Using Real-World Failure Records
Source: [https://arxiv.org/html/2606.31131](https://arxiv.org/html/2606.31131)
\\coltauthor\\Name

Anjali Parashar\\Emailanjalip@mit\.edu \\NameChuchu Fan\\Emailchuchu@mit\.edu \\addrMassachusetts Institute of Technology, Cambridge, MA

###### Abstract

To ensure safe on\-road behavior, pre\-deployment testing and failure discovery of Autonomous Driving Systems \(ADS\) is crucial\. Present day simulation based testing methods focus largely on mathematical models for efficient search of optimal scenarios, assuming a fixed scenario representation\. On the other hand, real\-world testing involves substantial manual effort to design scenario templates for testing\. These templates represent distinct failure scenarios consisting of pre\-deployment vehicle movements, map types, etc\. Historical failure records for ADS are a reliable source of real\-world failure conditions, which can be used for scenario generation\. In this work, we propose a scenario generation pipeline using categorical and contextual information available from historical records in natural language format\. Our approach consists of modular LLM based synthetic scenario generation, compatible with the testing constraints of a given system\. We successfully apply our method to generate a diverse set of scenarios for testing autonomous navigation on Metadrive simulator using the NHTSA ADS crash records\. Our approach results in accurate and diverse scenario generation with a combination of 4 road types, 3 non ego vehicle movement types, including on road anomalies in the form of working zones\. Generated scenarios align with the provided testing conditions, and reveals interesting failures of the system within a limited testing budget of 20 scenarios\. Code is available at\\urlhttps://github\.com/anjaliParashar/crash2scenario\.

###### keywords:

Scenario generation, Autonomous Driving System failures, Historical Failure Records\.

## 1Introduction

Failure discovery, or falsification, of Autonomous Driving Systems \(ADS\) is a key step in ensuring on\-road safety of vehicle occupants and surrounding entities on road\(najm2007pre;10\.1007/978\-981\-96\-7956\-0\_7;szenasi2021analysis\)\. Present day methods for real\-world ADS falsification focus on testing system performance over a suite of manually designed scenarios\(berger2015large\)\. The goal of test suite design is to cover a wide range of system functionalities and real\-world failure scenarios\. Here, time and cost impose constraints on the number of scenarios that can be tested\. Hence, extensive manual efforts are devoted to designing a set ofscenario templates, such as Car\-to\-Car Rear braking \(CCRb\), adult crossing the road, etc\., representing a range of real world failure settings\(euroncap\_website\)\. Eachscenario templateis further parameterized by a small number of scenario parameters, such as vehicle speed, which are varied across a uniform range to generate a combination of scenarios for testing the vehicle performance\.

The scenario templates and parameters are chosen by experts based on historical failure data to recreate failure\-relevant conditions, and test the vehicle performance across a range of operational settings\. Also dubbed as combinatorial testing, this structure is followed by regulatory testing authorities such as Euro New Car Assessment Programme \(NCAP\), NHTSA, etc\(hackney1995new\)\. The resulting test suite is compatible with real\-time testing requirements \(time and cost\), and provides a user\-centric baseline for vehicle performance on road\.

As sensor capabilities and ADS functionalities advance with engineering innovations, these hand\-designed, static test suites need to be revised periodically\(EuroNCAP\_2026\_Protocols\)\. Additionally, the generated test suites cannot be easily adapted to stress test a specific vehicle or system\. Therefore, it is possible that a vehicle crashes in real\-time due to a failure that has been validated, as a potential failure may remain undiscovered in testing\(teambhp\_ncap\_overoptimisation\)\.

As an alternative to static test suites, several adaptive scenario generation frameworks have emerged in simulation\-based testing\(klampfl2024testing;zhu2022review;dawson2023a;koren2018adaptive;koren2021finding;ghaiGeneratingAdversarialDisturbances2021;hanselmannKINGGeneratingSafetyCritical2022a;wong2018provable;karve2026optimizing\)\. However, these methods assume access to a pre\-defined scenario template and focus primarily on leveraging mathematical tools for efficient search of numerical scenario parameters associated to a fixed scenario template and set of parameters\.

Historical ADS crash records capture both common and non\-trivial real\-world failures that fixed scenario templates miss\(NHTSA\_ADAS\_Crash\_Report\_SGO\)\. For example, the CCRb template only varies ego and target velocities, ignoring contextual factors like work zones that substantially increase failure likelihood\. This leaves a gap in testing practice, as there is no principled way to extract scenario definitions from crash data and use them for system\-specific testing\. While simulation based testing approaches encourage exploration of numerical scenario parameters\(delecki2022we;dawson2023a\), it is also important to cover all types of failure scenarios for thorough testing\.

In this work, we propose a scenario generation framework that uses data from existing crash records in a principled way, while incorporating system specific testing constraints\. We construct a test suite ofscenario templatesparameterized using existing categorical features available in the crash records, such as pre\-crash vehicle movements, road type, etc\. Crash records also provide natural language description of the incident, callednarratives, providing detailed, granular insights about failure conditions\. Recently,leung2025roadsuccessfully demonstrated scene synthesis in autonomous driving using LLMs\. We adopt this concept to transform the narratives of crashes to a system specific scenarios using LLM assisted scenario generation pipeline\. Our proposed approach can be easily used for initial scenario generation, over which low\-level testing methods such as Bayesian Optimization\(parashar2024failure\), MCMC sampling\(parashar2024learning;delecki2022we\), and gradient based optimization\(dawson2023a\)can be deployed to further iterate and optimize over numerical parameters\.

We validate our end\-to\-end scenario generation pipeline by designing scenarios using NHTSA ADS crash records\(NHTSA\_SGO\_2021\)\. The scenarios are designed for testingIDM policyonMetadrivesimulator\(li2022metadrive\)for AV navigation and obstacle avoidance\. As discussed in Section[6](https://arxiv.org/html/2606.31131#S6), our approach successfully transforms natural language descriptions failure scenarios, and generates them for simulation\. Using the historical records, we test the system on common as well as uncommon failure conditions\. Our main contributions are as follows:

- •We present an automated scenario generation procedure that utilizes the categorical feature based scenario template design, as well as incident\-specific narrative available from real\-world ADS crashes\.
- •We propose a method to adapt the available records to system specific test scenarios using LLM based scenario generation pipeline\.
- •We validate our approach for system specific testing of crashes extracted from NHTSA crash records using theMetadrivesimulator\.

## 2Related Works

The key approaches in autonomous system testing can be divided in three categories, based on cost of testing, leading to different mathematical tools that we discuss below\.

### 2\.1Simulation based testing using sampling based approaches

These methods assume access to a cheap simulation model, and construct failure discovery as a cost\-guided search for a specific dynamic system, using sampling\-based methodsdelecki2022we;okellyScalableEndtoEndAutonomous2018a;sinha2020neural;dawson2023a;koren2018adaptiveand optimization techniques\(wong2018provable;hanselmannKINGGeneratingSafetyCritical2022a\)\. These approaches do not incorporate the cost of test case generation, therefore, they focus on leveraging sample extensive mathematical frameworks such as MCMC sampling\(dawson2023a;delecki2022we\), MCTS\(koren2018adaptive\)for efficient search space coverage, discovery of rare failures, and average system validation\. Generally, scenario parameterization is a user\-defined choice, and not directly informed by existing crash records, as in the case of regulated real\-world testing\.

### 2\.2Real\-world testing using historical records

Combinatorial testing practices for real\-world testing leverage information from recreating historical crash records\(EuroNCAP\_2026\_Protocols\)\. Euro NCAP protocols assess for safety metrics across multiple criteria, which can be broadly divided into three categories– focus on protection of vehicle inhabitants \(adult and child\) from the impact of collision, Vulnerable Road Users \(VRUs\), \(pedestrians and bicyclists\) and non\-ego actors \(cars\)\. For each category of ego and non\-ego agent, a variety of scenario templates are defined using a set of scenario parameters, such as speed of vehicle, relative location and motion of non\-ego actor, and time\-to\-collision \(TTC\)\. The breadth of testing conditions covered by these scenarios is intended to provide a rich scenario coverage\. While these approaches provide a baseline to compare between different vehicles, they do not stress\-test a given vehicle to expose system specific failures\. Additionally, they assume a rigid structure for designing scenario templates, rendering the overall pipeline non\-transferable to simulation testing\.

### 2\.3Combined sim and real testing using multi\-fidelity approaches

Recently, some works have considered cost aware adaptive testing using Bayesian Experimental Design \(BED\) based formulations, using a surrogate model and sequential sampling of scenarios\(parashar2025cost;sinha2024rate;parashar2024failure\)\. Some of these works also extend to joint testing using simulation and real world systems, by using multi\-fidelity scenario generation and incorporating cost associated to each fidelity\(parashar2024failure\)\. These approaches use a mix of active learning/BED and sampling based approaches, and also make fundamental assumptions regarding pre\-defined access to scenario parameterization\.

Our work is complementary to each of these approaches, as we focus fundamentally on principled scenario design using historical crash records as a realistic source of scenario template design and parameterization\. set of scenario templates extracted using our method can be plugged in to any of the above approaches, since we treat system specific adaptation as a modular component of our pipeline\.

## 3Problem Statement

##### Definitions\.

Consider an ADS, defined asx˙=f​\(x,π​\(o,zs\)\)\\dot\{x\}=f\(x,\\pi\(o,z\_\{s\}\)\)with statex∈𝒳x\\in\\mathcal\{X\}, with system observationo∈𝒪o\\in\\mathcal\{O\}and scenario parameterszs∈𝒵z\_\{s\}\\in\\mathcal\{Z\}, which takes actiona∈𝒜a\\in\\mathcal\{A\}based on a certain policyπ:𝒪×𝒵→𝒜\\pi:\\mathcal\{O\}\\times\\mathcal\{Z\}\\to\\mathcal\{A\}\. We assume access to the system with pre\-defined policyπ\\pithat can takezsz\_\{s\}as input and generate trajectory rollouts, and evaluate system performance using metricsy∈𝒴y\\in\\mathcal\{Y\}\. These metrics are assumed to be user\-defined, for example, TTC, minimum gap, etc\. The subscriptssinzs∈𝒮z\_\{s\}\\in\\mathcal\{S\}denotes ascenario template, which contains sufficient contextual information to uniquely define a collection of scenarios that can be distinguished from other scenarios in𝒮\\mathcal\{S\}usingMMmeta variables\[si\]i=1M\[s\_\{i\}\]\_\{i=1\}^\{M\}\. For example, road type, ego and non\-ego vehicle trajectory type, and weather are meta variables\. Each unique combination of these meta variables can be used to define a scenario template\. For eachs∈𝒮s\\in\\mathcal\{S\}, we define a scenariozs∈𝒵z\_\{s\}\\in\\mathcal\{Z\}using numerical values of variables such as ego vehicle speed, initial gap between vehicles, etc\. For example, a straight road, with ego vehicle turning left and non\-ego vehicle moving straight is a scenario templatess, for which, specifications such as length of road, speed of agents and trajectory parametrizeszsz\_\{s\}\.

The objective of this work is to propose a principled scenario design paradigm, that uses historical failure records as a reference and adequately stress tests a given system\. We use these records in a principled manner as a way to mitigate the manual effort that goes in scenario design\. The generated scenarios can be adapted for system specific testing, alleviating common concerns with static testing methods\. In this work, we use ADS crash report format available in NHTSA records for constructing scenario templates\. Our scenario design pipeline consists of three\-step LLM based template generation to use crash data to design scenario templates for system specific testing\. This can subsequently be used to sample scenarios in a cost aware manner while ensuring diversity across generated test cases, as we show in Section[6](https://arxiv.org/html/2606.31131#S6)\.

## 4Scenario design using historical failure data

Our scenario design approach consists of two main steps, \(1\) design ofmeta variablesusing crash records, and \(2\) LLM based scenario design using crash records as inputs\. The first step provides an example for how to use historical failure records for efficient scenario representation\. The second step is modular and can be used with other failure database as well\.

### 4\.1Extracting information from historical failure data

The information from each crash in the NHTSA ADS crash records can be divided in two categories:

- •Meta variables: Categorical variables denoting quantities of interest, such as ‘Pre\-crash movement Crash Partner \(CP\)’, ‘Pre\-crash movement Self Vehicle \(SV\)’, ‘Road Type’, etc\. The set of discrete variables that can be supported by the simulator/system are used to definemeta variables\.
- •Narrative: Contextual information pertaining to the scene, recorded in the form ofNarrative, which describes qualitative aspects of the ego and non\-ego trajectories, and their relative location in the global frame of reference denoted by a road type\.

Table[1](https://arxiv.org/html/2606.31131#S4.T1)shows meta variables extracted from NHTSA ADS records\. The set of meta variables selected must be compatible with scenario generation capabilities of the system/simulator\. These meta variables can be used to define a scenario template in multiple ways\. The minimum requirement here is that each combination of meta variables must provide a high level interpretable categorization to distinguish between different failure scenarios\. For this, we directly use the categorical features from crash records\. These variables have also been used in the design of ontological scenario models for AV testing in literature\(schuldt2017beitrag;bagschik2018ontology\)\. For example, a vehicleproceeding straight\(s1s\_\{1\}\) on ahighway\(s2s\_\{2\}\) colliding with another vehiclemaking a left turn\(s3s\_\{3\}\) defines a scenario template\. Here, the corresponding meta variables are Pre\-crash movement SV, Road Type and Pre\-Crash movement CP respectively\. Fig\.[1](https://arxiv.org/html/2606.31131#S4.F1)shows narratives for a scenario template from meta variables:Road Type\-Intersection,SV movement\-Proceeding Straight,CP movement\-Making Left Turn,Work zone\-False\. The collision reports nominally report scenario description of short term activity of SV to a collision, and therefore involve two agents, SV and CP\. This can be evolved to construct scenario template including long term reactive behavior of other traffic agents\.

Table 1:Meta variables adopted from crash records with permissible values\. This is chosen based on simulator and scenario generation compatibility\.Our next step uses a scenario template designed using this approach alongwith crash narrative to generate system specific scenarios for testing\.

### 4\.2LLM based scenario generation

After defining a scenario template using meta variables, we use a three step process to design system specific failure scenarios\. For this, we use three LLM agents sequentially, \(1\) paraphrasing agent, \(2\) scenario generation agent, \(3\) fine\-tuning agent\. The details are discussed next\.

#### 4\.2\.1Paraphrasing a narrative\.

Anarrativeconsists of contextually relevant as well as incident specific details such as location, time of crash, etc\., which may not be relevant for the scenario generation process\. The major topological details are already captured in meta variables, and the narrative provides further contextual details for a crash, which can assist the design of specific crash scenarios\. Therefore, we first paraphrase the information from a narrative using LLM as a paraphrasing agent to retain minimum necessary details relevant to scenario generation process\.

This step is also used to filter out scenario designs that cannot be reconstructed with the chosen system, and replace them using other alternative narrative descriptions\. For example, crash records often contain collision incidents that involve rear\-to\-front collisions of ego vehicle \(rear\) with non\-ego vehicle \(front\) due to the negligence of non\-ego vehicle\. However, these are not relevant for testing a specific policy deployed on our system\. The paraphrasing tool assists in generating realistic, plausible alternative scenarios for such cases\. Fig\.[1](https://arxiv.org/html/2606.31131#S4.F1)shows an example of narrative design using the paraphrasing agent\.

![Refer to caption](https://arxiv.org/html/2606.31131v1/x1.png)Figure 1:Extraction of Narratives for scenarios with Road Type\-Intersection, CP movement\- Making Left Turn, SV movement\- Proceeding Straight, Work Zone\-No, from historical records \(left\)\. The collected data is used by the paraphrasing agent to generate a single scenario that meets design requirements\. Text in red shows the system specific modification made by the agent to accommodate simulation constraints \(middle\)\. A scenario is generated using the proposed narrative in the simulator \(right\)\.
#### 4\.2\.2Scenario generation using narrative\.

The paraphrased narrative is used as an input to another LLM agent that parses the natural language description of crash, in addition to scenario templatess, to generate a scenariozsz\_\{s\}for a possible crash\. The scenario specification is encoded in the prompt, and must be directly compatible as an input to the system for testing\. Appendix[D\.2](https://arxiv.org/html/2606.31131#A4.SS2)shows prompt used for this step\.

The choice of specific variables and format used to define a scenario based on a given template is system and simulator specific\. For example, we useMetadrivesimulator in this work, where policies for non\-ego agents can be defined to support movements such as path following using pre\-availableIDM Policy, orSudden Braking,Accelerate\-then\-brakepolicies, which are custom policies for sudden braking, or accelerated path following followed by abrupt braking respectively\. The ego agent also follows a chosen policy\. In addition to movement type, attributes such as target speed of egovev\_\{e\}and non\-egovnv\_\{n\}, initial longitudinaldxd\_\{x\}and lateral gapdyd\_\{y\}, map type can be used to define a scenario\. Each meta variable can map to one or multiple variables\. Table[5](https://arxiv.org/html/2606.31131#A3.T5)shows the parameters chosen to define a scenario in our experiments\. Fig\.[2](https://arxiv.org/html/2606.31131#S4.F2)shows a simulation for a scenario generated using the scenario generation agent using a narrative generated by the paraphrasing agent\. This step can be used with any other simulator choice such asScenic\(fremont2023scenic\), by changing the desired schema in the prompt\.

![Refer to caption](https://arxiv.org/html/2606.31131v1/x2.png)Figure 2:Trajectory rollout for Cluster 6\.Meta variables for this example are: Road Type\- Traffic Circle, Work Zone\-No\. We show an exemplar LLM paraphrased narrative \(Section[4\.2\.1](https://arxiv.org/html/2606.31131#S4.SS2.SSS1)\) at thebottom, and corresponding trajectory rollouts for a scenario \(top\) obtained for the scenario generated using scenario generation \(Section[4\.2\.2](https://arxiv.org/html/2606.31131#S4.SS2.SSS2)\)\.
#### 4\.2\.3System specific fine\-tuning\.

To refine the scenario generated by the LLM, it is helpful to steer the scenario generation process using a set of metricsyy, that are collected at the end of each rollout\. Fig\.[3](https://arxiv.org/html/2606.31131#S7.F3)shows an example of a scenario before and after fine\-tuning the scenario variables using minimum gap as a metric, leading to a reduction in the minimum gap between ego and non\-ego vehicles\.

The scenarios generated by our paradigm can be easily integrated with existing mathematical tools for testing\. As an example, we demonstrate the generation of a test suite with diverse scenarios using a clustering based approach that ensures high diversity of scenario selection within limited budget\.

## 5Cost aware scenario design and testing

The parameterization of crash records using meta variables and narratives can be used to cluster the records, and subsequently design synthetic scenarios that represent key characteristics of each cluster\. This idea is used in limited budget data acquisition for learning, as it inherently assures coverage by only generating one scenario per cluster\(axiotis2024data\)\. We apply this concept to generate 20 clusters of scenarios using k\-means clustering withk=20k=20\.

## 6Experimental Validation

We useMetadrivesimulator withIDMpolicy for the ego agent to test our scenario generation approach using the NHTSA ADS accident records for the year 2021\. First, we extract scenario templates shown in Table[1](https://arxiv.org/html/2606.31131#S4.T1), for which permissible values are shown in Table[5](https://arxiv.org/html/2606.31131#A3.T5)\. Then, we useGPT 5\.2model\(openai2025gpt52\)for paraphrasing narratives, converting overall information to scenario usingMetadrivesimulator, and fine\-tuning the scenarios\. Below we discuss each of the steps in detail\.

### 6\.1Generating scenario template from crash records

For this step, we choose meta variables from a larger set of categorical variables available in the crash records based on our setup in the simulator\.

##### Extracting meta variables from crash records\.

From a total 2295 crashes, we eliminate data entries where ’Narrative’ column is redacted, reducing the dataset to 1911 crashes\. We restrict the movement of CP and SV to be ‘Proceeding Straight’, ‘Making Left Turn’ and ‘Making Right Turn’, as these are easily reproducible in the simulator, as opposed to movements such as ‘Backing’, or ‘Unknown’\. This reduces the overall number of crashes to 235, from which we obtain the set of meta variables shown in Table[1](https://arxiv.org/html/2606.31131#S4.T1)\. As discussed before, each unique combination of these meta variables defines a scenario template, leading to 25 permissible scenario templates\.

As shown in Table[2](https://arxiv.org/html/2606.31131#S6.T2), these scenario templates are not distributed uniformly by default\. Infact, only 14 out of 25 scenario templates are observed in data\. With 104 records out of 235, most commonly occurring scenario template corresponds to the following meta variable values: Intersection, Proceeding Straight, Proceeding Straight,False, for road\-types, CP/SV pre\-crash movements, and work\-zone respectively\. Only 9 out of 235 crashes correspond to Work Zone variable True, reflecting that crashes due to presence of Work Zone are rare\.

\(a\) Road Type Distribution

\(b\) CP–SV Interaction Matrix

Table 2:Crash distribution by Road Type and CP\-SV pre\-crash movements\. We highlight the most and least frequently occurring meta variables in bold\.

### 6\.2Design of scenario\.

The simulator supports five map types,I,T,O,C,S, representing four\-way intersections, three\-way intersections, roundabouts, curved roads, and straight roads\. We directly use map type and ego target speed as scenario parameters\. To model Crash Partners \(CPs\), we add two non\-ego policies with irregular motion,SuddenandAccelerate\-then\-brake, alongside nominalIDM\. Scenarios are generated using the three\-stage LLM\-assisted pipeline illustrated in Section[4\.2](https://arxiv.org/html/2606.31131#S4.SS2)\. The crash records for scenario generation are selected using a clustering method, to ensure generation of diverse scenarios, that we briefly discuss below\.

### 6\.3Clustering to generate typical scenarios

We cluster the collection of crashes into 20 clusters using k\-means clustering using the 14 scenario templates, represented in the form of one\-hot encoding, alongwith a 6\-dimensional vector embedding of the narratives as features for clustering\. The embedding is generated usingall\-MiniLM\-L6\-v2\(reimers2019sentence;wang2020minilm\)and compressed with PCA\. This representation encodes diversity of scenarios by frequency of failure, contextual details, as well as meta variables\. The resulting clusters represent a collection of scenarios that are rare as well as frequent, correspond to different spatio\-temporal types, and different contextual specifics encoded in the narrative\. Table[3](https://arxiv.org/html/2606.31131#S6.T3)shows the meta variables corresponding to each cluster\. We observet that including the narratives in clustering further adds benefit\. Firstly it helps to segregate between contextually different scenarios with same scenario template\. Second, it helps in segregating scenarios that do not have incident relevant information \(for example, Cluster 3\)\. Appendix[C](https://arxiv.org/html/2606.31131#A3)shows narratives generated for each cluster using the paraphrasing agent\.

Table 3:Cluster centroids grouped by scenario templates \(same color for identical Road/SV/CP/WZ, PS: Proceeding Straight, LT: Making Left Turn, RT: Making Right Turn; Int: Intersection, Hwy: Highway/Freeway, Circle: Traffic Circle; WZ: N/Y\.Table 4:Initial/Final Minimum Gap and Percentage Difference by Cluster\. Clusters for which collision happens for intial scenario are not sent for fine\-tuning\.

## 7Discussion

Our main contribution with this work is to present an automated scenario generation paradigm that uses real world records to construct realistic scenarios for testing\. We evaluate the overall merit of our approach in testing the given policy using three key questions\. Appendix[A](https://arxiv.org/html/2606.31131#A1)provides further discussion, assessing each component of our pipeline\.

\(Q1\) How accurate is scenario generation by our pipeline?For the 20 clusters considered, 17 out 20 scenarios match the scenario template meta variables completely\. For 3 scenarios, one of the meta variables is not accurately represented \(Cluster 6 and 18\- CP movement direction is incorrect, Cluster 11\-Work Zone present in the scenario but should be False\)\. We also find the generated CP movements match the details of the narrative accurately \(Fig[2](https://arxiv.org/html/2606.31131#S4.F2)\)\. The accuracy of match and difficulty of scenario can be further optimized using multiple steps of the fine\-tuning agent \(Section[4\.2\.3](https://arxiv.org/html/2606.31131#S4.SS2.SSS3)\)\. Appendix[D\.3](https://arxiv.org/html/2606.31131#A4.SS3)shows the fine\-tuning prompt\. Table[4](https://arxiv.org/html/2606.31131#S6.T4)shows that the minimum gap reduces substantially using a single iteration of fine\-tuning for 14 out of 16 scenarios that need fine\-tuning\.

\(Q2\) Can this approach be effectively used for testing of autonomous vehicles?We successfully generate interesting scenarios of collisions as well as near collisions, as evident from the minimum gap shown for all 20 scenarios in Table[4](https://arxiv.org/html/2606.31131#S6.T4)\. We use clustering as a way to segregate the available data into diverse groups, as shown in Table[3](https://arxiv.org/html/2606.31131#S6.T3)\. The clustering captures 11 out of the 14 scenario templates present naturally in historical records\.

\(Q3\) What do we learn about the underlying system using the generated tests?

![Refer to caption](https://arxiv.org/html/2606.31131v1/x3.png)Figure 3:Trajectory rollout for Cluster 3\.Meta variables for this scenario are: Road Type\-Intersection, CP movement\-Proceeding Straight, Work Zone\-No\. Figure shows frames corresponding to initial gap before fine tuning,left\) and final gap for scenario fine\-tuned by fine\-tuning agent \(right\), showing the improvement in fatality made using LLM based fine\-tuning\.![Refer to caption](https://arxiv.org/html/2606.31131v1/x4.png)Figure 4:Trajectory rollout for Cluster 2\.Meta variables for this scenario: CP movement\-Proceeding Straight, Road Type\- Intersection, Work Zone\-No\. While SV does not crash and maintains a safe distance from CP, SV shows oscillatory movement to avoid crash at the beginning, despite being at a sufficient distance from CP\.The paraphrasing agent and scenario design agent adapt to system specific requirements\. The generated scenarios reveal interesting details\. We observe that for CP movement\-Proceeding Straight, theIDM policyensures that the vehicle maintains a safe distance, and avoids crashes\. However, this results in early braking and oscillatory movement \(Fig\.[4](https://arxiv.org/html/2606.31131#S7.F4)\), which is undesirable on\-road\. We observe that the policy is not as reactive to smaller obstacles such as cones, \(Fig\.[5](https://arxiv.org/html/2606.31131#A1.F5)\), as it is to larger vehicles, and collides often with the cones when Work Zone is set to True\. While head\-on collision in straight motion is avoided successfully, the policy crashes in lateral motion \(Fig\.[2](https://arxiv.org/html/2606.31131#S4.F2)\), leading to side collisions\.

## 8Limitations & Future Work

In this work, we propose an automated scenario generation tool for extracting information from historical records to generate diverse testing scenarios, that can be applied to a given system\. This alleviates the need to handcraft or manually design the scenario parameters\. This can be easily combined with existing sampling or optimization tools for further discovery of worst case scenarios\. Additionally, LLMs can be used to create synthetic failure narratives in the absence of real world information, which can be used to adaptively explore the space of scenario templates, which has not been explored in literature so far, and is a direction for future work\.

The current limitation of the work stems from a lack of unified definition for concepts such as coverage in the domain of testing & validation, leading to non\-unique ways in which diverse generation of scenarios can be pursued\. Additionally, we observe that fixing a given policy gives very limited control to replicate the ego vehicle movement, thereby restricting complete replication of desired scenarios\. We also aim to extend this work to the generation of visual conditions for testing of vision based policies in ADS\.

## References

## Appendix ADiscussion of results

![Refer to caption](https://arxiv.org/html/2606.31131v1/x5.png)

![Refer to caption](https://arxiv.org/html/2606.31131v1/x6.png)

Figure 5:Trajectory rollout for Cluster 4 & 13\.Meta variables for this scenario: CP movement\-Proceeding Straight, Road Type\- Intersection \(Cluster 4\), Highway/Freeway \(Cluster 13\), Work Zone\-Yes\.\(Q4\) Why do we need a separate paraphrasing agent to generate synthetic narratives?Firstly, narratives may not be available for all kinds of scenario templates\. In such cases, the LLM\-based paraphrasing agent \(Section[4\.2\.1](https://arxiv.org/html/2606.31131#S4.SS2.SSS1)\) is useful in generating narratives based on scenario templates due to its generalizable performance\. For example, for Cluster\-3 \(LLM generated narrative shown in Appendix[C](https://arxiv.org/html/2606.31131#A3)\), the narrative records provide no incident relevant information, and the paraphrasing agent creates a realistic description of crash\. Second, this helps in incorporating system specific constraints in downstream scenario design by revising the narrative such that it is feasible to recreate for a given simulator/system\. For example, Fig\.[1](https://arxiv.org/html/2606.31131#S4.F1)shows text in red, introduced by the paraphrasing agent using the information on types of policies that can be used to replicate non\-ego agent \(CP\) behavior\. This information can be provided in the prompt \(Appendix[D\.1](https://arxiv.org/html/2606.31131#A4.SS1)\)\.

\(Q5\) How useful are historical crash records in generating realistic testing conditions?

Unlike large scale on\-road self\-driving datasets, historical records encode concise records focused only on on\-road accidents\. This mitigates the need for data\-processing to extract failures\. These records are used by regulatory authorities for scenario design for testing, however this tends to be a largely manual effort\. One of the key contributions of our work is to automate this process, where LLM based agents are used to take system constraints into account and design scenarios for testing based on provided records\. The historical records provide a natural way to categorize available data, which we leverage in our scenario template construction \(Table[1](https://arxiv.org/html/2606.31131#S4.T1)\)\.

## Appendix BNon\-ego policy hyperparameter

In addition to target speeds of each CP, we also provide initial position of each CP relative to SV, using lateral and longitudinal gap, \(Δt,Δs\\Delta\_\{t\},\\Delta\_\{s\}\), lane offset,\{\+1,0,−1\}\\\{\+1,0,\-1\\\}, which corresponds to right lane, same lane, and left lane\. We also provide an option to initialize the vehicle in the same or opposite direction, leading to rear or head\-on collisions\. We further enable background traffic and control traffic density as a parameter between 0 and 1\. Higher background traffic increases chances of collisions, but also increases randomization in scenario generation\. These values are summarized in Table[5](https://arxiv.org/html/2606.31131#A3.T5)and can be directly provided as scenario input to theMetadrivesimulator\.

## Appendix CClustering scenario details

Cluster\-0\.Road: Intersection, SV: Making Left Turn, CP:Making Left Turn, Work Zone:NoNarrative generated\.The vehicle was proceeding straight through an intersection at a steady speed\. A passenger vehicle at the cross street entered the intersection to make a left turn, intending to merge into the vehicle’s direction of travel\. The passenger vehicle accelerated into the turn, then abruptly braked mid\-turn while partially occupying the vehicle’s lane\. With limited time and distance to avoid, the vehicle continued forward and the front of the vehicle contacted the passenger vehicle’s side within the intersection\.

Cluster\-1\.Road: Intersection, SV: Proceeding Straight, CP:Making Left Turn, Work Zone:NoNarrative generated\.The vehicle was proceeding straight through an intersection on the main road\. A passenger vehicle on the cross street entered the intersection and began a left turn to merge onto the vehicle’s roadway in the same direction of travel\. The turning vehicle accelerated into the near lane and encroached into the vehicle’s lane without yielding, reducing the available gap\. The vehicle braked but was unable to avoid contact, striking the turning vehicle near its left\-rear quarter as it completed the left turn\.

Cluster\-2\.Road: Intersection, SV: Making Right Turn, CP:Making Left Turn, Work Zone:NoNarrative generated\.On April \[XXX\], 2025 at 3:12 PM local time, the self\-driving vehicle \(SDV\) approached a standard four\-way signalized intersection in the right\-turn lane intending to make a right turn onto the cross street\. A crash\-partner passenger vehicle was traveling proceeding straight through the intersection on the cross street \(from the SDV’s left to right\)\. As the SDV began its right turn at low speed, the crash\-partner vehicle continued straight at a moderate speed and closed the gap faster than expected\. The SDV entered the near lane of the cross street while the crash\-partner vehicle was still within the conflict zone, resulting in an angle impact in the intersection/near\-corner area \(front\-right area of the SDV contacting the side/front\-quarter of the crash\-partner vehicle\)\. No work zone activity was present\.

Cluster\-3\.Road: Intersection, SV: Proceeding Straight CP: Proceeding Straight, Work Zone:NoNarrative generated\.On an evening in 2023, the self\-driving vehicle \(“Ego”\) was traveling straight through a signalized four\-way intersection on a steady approach speed with the traffic signal indicating a permissive phase for Ego to proceed\. A passenger vehicle \(“Crash Partner”\) approached the same intersection from the cross street and continued straight into the intersection against its signal\. As the Crash Partner entered the intersection, it briefly accelerated and then abruptly braked while still within the conflict zone, reducing the available gap\. Ego initiated braking but, due to the late incursion and the Crash Partner’s sudden deceleration in the intersection, Ego was unable to fully avoid the crossing vehicle and the vehicles collided in the intersection in a right\-angle \(side\-impact\) configuration\.

Cluster\-4\.Road: Intersection, SV: Making Left Turn CP: Proceeding Straight, Work Zone: YesNarrative generated\.A vehicle \(Vehicle 1\) in autonomous mode approached a coned work\-zone intersection and prepared to make a left turn\. The temporary cone pattern narrowed and shifted the travel lanes through the intersection, placing Vehicle 1’s turn path closer to the cross\-traffic lane than usual\. After slowing, Vehicle 1 began the left turn at low speed\. A passenger vehicle \(Vehicle 2\) traveling straight through the intersection in the adjacent through lane continued forward at a steady speed\. As Vehicle 1 entered the intersection and crossed the straight\-through path, Vehicle 2 reached the conflict point and struck Vehicle 1 in the intersection, resulting in an angle/side\-impact collision\.

Cluster\-5\.Road: Highway / Freeway, SV: Proceeding Straight CP: Proceeding Straight, Work Zone: NoNarrative generated\.On June \[XXX\], 2024 at 7:42 AM PT, the ego vehicle was traveling eastbound on a multi\-lane freeway, proceeding straight in the center lane at highway speed\. A crash partner vehicle was proceeding straight in the adjacent right lane at a similar speed\. As both vehicles continued forward, the crash partner gradually drifted left within its lane and then encroached over the lane line into the ego vehicle’s lane without a turn or lane\-change maneuver\. The ego vehicle applied braking to maintain a safe separation, but the crash partner’s continued encroachment resulted in contact between the ego vehicle’s left\-front corner and the crash partner’s right\-rear quarter panel while both vehicles were still moving straight\.

Cluster\-6\.Road: Traffic Circle, SV: Making Left Turn, CP: Making Left Turn, Work Zone: NoNarrative generated\.On February \[XXX\], 2025 at 6:22 PM PT, an AV entered a traffic circle and initiated a left\-turn movement to continue counterclockwise toward its intended exit\. A passenger vehicle approaching from a different entry also entered the traffic circle and initiated a left\-turn movement into the same circulating lane\. As both vehicles progressed through the circle, the passenger vehicle accelerated to close the gap and then braked abruptly while drifting inward within the lane, reducing lateral clearance\. The AV, already committed to the left\-turn path within the traffic circle, had insufficient time and space to maintain a safe lateral buffer and avoid conflict\. The right\-front of the passenger vehicle contacted the left\-front quarter of the AV within the circulating portion of the traffic circle\.

Cluster\-7\.Road: Highway / Freeway, SV: Proceeding Straight, CP: Making Left Turn, Work Zone: NoNarrative generated\.Pre\-crash movement through telematics data showed the self vehicle proceeding straight in the rightmost travel lane on a highway at a steady speed\. A crash partner vehicle was initially positioned on the right shoulder ahead of the self vehicle, then accelerated into the roadway and initiated a left turn across the travel lanes toward a median crossover/opening\. The self vehicle applied braking while continuing straight, but the crash partner continued the left\-turn maneuver into the self vehicle’s lane, resulting in a front\-corner\-to\-side impact as the self vehicle struck the crash partner during the crossing movement\.

Cluster\-8\.Road: Intersection, SV: Making Left Turn, CP: Making Left Turn, Work Zone: NoNarrative generated\.At approximately 3:18 p\.m\. on a clear weekday afternoon, the self\-driving vehicle approached a signal\-controlled intersection in the southbound lane and initiated a permitted left turn to travel eastbound\. At the same time, a crash partner vehicle approached the same intersection from the westbound approach and began a right turn to travel northbound\. As the self\-driving vehicle entered the intersection and continued its left\-turn path, the crash partner vehicle rolled forward and then accelerated into its right turn, entering the intersection later than expected\. The vehicles converged in the intersection turn area, and the front of the crash partner vehicle struck the right side of the self\-driving vehicle during the left turn\.

Cluster\-9\.Road: Intersection, SV: Proceeding Straight, CP: Proceeding Straight, Work Zone: NoNarrative generated\.On March \[XXX\], 2024 at 4:22 PM PT, the ego vehicle was proceeding straight northbound through a signalized intersection at a steady speed\. A crash partner vehicle was proceeding straight eastbound on the cross street toward the same intersection\. As the ego vehicle entered the intersection, the crash partner continued straight into the intersection against the traffic control and did not yield\. The ego vehicle applied braking but was unable to avoid contact, resulting in a side\-impact collision within the intersection\.

Cluster\-10\.Road: Intersection, SV: Making Left Turn, CP: Proceeding Straight, Work Zone: NoNarrative generated\.Ego vehicle approached a signalized intersection in the left\-turn lane and initiated a left turn on a permissive phase\. A crash\-partner vehicle was proceeding straight through the intersection from the cross street\. As ego entered the intersection to complete the turn, the crash\-partner continued straight at a steady speed without yielding, and the vehicles converged in the center of the intersection, resulting in a side\-impact collision as ego crossed the crash\-partner’s path\.

Cluster\-11\.Road: Highway/Freeway, SV: Proceeding Straight, CP: Proceeding Straight, Work Zone: NoNarrative generated\.On \[XXX\], 2022, the ego vehicle was traveling eastbound on a limited\-access freeway, proceeding straight in the right travel lane at a steady speed\. A crash partner passenger vehicle was proceeding straight in the adjacent lane to the left, traveling parallel to the ego vehicle\. A short line of traffic cones had been placed along the dashed lane line ahead \(not an active work zone\), narrowing the usable space between the two lanes\. As both vehicles continued straight, the crash partner tracked slightly right of center in its lane while passing the cones, encroaching across the lane line\. The crash partner’s right side contacted the ego vehicle’s left front/left side, resulting in a sideswipe\-type collision while both vehicles were proceeding straight\.

Cluster\-12\.Road: Highway/Freeway, SV: Proceeding Straight, CP: Proceeding Straight, Work Zone: YesNarrative generated\.On April \[XXX\] at approximately 10:35 AM CT, the ego vehicle was traveling eastbound on a highway through an active work zone, proceeding straight in the right through\-lane at a steady speed\. A crash\-partner SUV was traveling in the adjacent left through\-lane, also proceeding straight at a similar speed\. The work zone used cones to taper and narrow both lanes, reducing lateral clearance between vehicles while maintaining straight\-ahead travel\. As both vehicles entered the narrowed section, the crash\-partner SUV tracked slightly toward the right within its lane while continuing forward, and its right side encroached into the ego vehicle’s lane space\. The ego vehicle and the crash\-partner SUV made contact in a sideswipe along their left/right sides, respectively, while both continued proceeding straight through the coned work zone\.

Cluster\-13\.Road: Highway/Freeway, SV: Proceeding Straight, CP: Proceeding Straight, Work Zone: YesNarrative generated\.On \[XXX\], 2024 at \[XXX\] PT, an autonomous vehicle was traveling proceeding straight on a highway through an active work zone with cones tapering and narrowing the available lanes\. A passenger vehicle traveling proceeding straight in the adjacent lane approached the cone taper and continued forward as the lane boundary shifted, gradually encroaching toward the vehicle’s lane\. The vehicle reduced speed to maintain a safe buffer in response to the narrowing roadway, but the passenger vehicle’s continued forward motion alongside the taper resulted in a sideswipe contact along the vehicle’s left side as both vehicles remained traveling straight through the work zone\.

Cluster\-14\.Road: Intersection, SV: Proceeding Straight, CP: Making Left Turn, Work Zone: NoNarrative generated\.On Tuesday, February \[XXX\], 2025 at 3:18 PM local time, the ego vehicle was proceeding straight through a signalized intersection at a steady speed\. A crash partner vehicle was positioned on the cross street at the intersection and initiated a left turn to enter and merge onto the ego vehicle’s roadway\. As the crash partner accelerated into the turn, it entered the ego vehicle’s lane while the ego vehicle was already committed to the intersection\. The ego vehicle applied braking but was unable to avoid contact\. The ego vehicle’s front\-left area contacted the crash partner’s right\-side area within the intersection as the crash partner continued the left\-turn path\. No work zone was present\.

Cluster\-15\.Road: Intersection, SV: Making Right Turn, CP: Making Right Turn, Work Zone: NoNarrative generated\.A vehicle \(Vehicle 1\) operating in autonomy approached a signalized intersection in the inside of two dedicated right‑turn lanes and began a right turn on a green indication\. A passenger vehicle \(Vehicle 2\) in the adjacent outside right‑turn lane also initiated a right turn at approximately the same time\. During the turn, Vehicle 2 tracked wider than its marked lane and encroached into the inside receiving lane, moving laterally into Vehicle 1’s path\. Vehicle 1 braked to maintain a safe gap but was unable to fully avoid contact, resulting in a low‑speed sideswipe/angle impact within the intersection as both vehicles completed their right turns\. No work zone was present\.

Cluster\-16\.Road: Intersection, SV: Proceeding Straight, CP: Proceeding Straight, Work Zone: NoNarrative generated\.On March \[XXX\], 2025 at 6:18 PM local time, an autonomous vehicle \(AV\) was proceeding straight through a signalized intersection in its lane at a steady target speed\. A crash partner vehicle approached the same intersection from the cross street and also proceeded straight\. As the AV entered the intersection, the crash partner continued straight into the intersection at a higher speed than the available gap and did not yield, resulting in a side\-impact collision within the intersection \(front of the crash partner contacting the AV’s right side\)\. No work zone was present\.

Cluster\-17\.Road: Intersection, SV: Proceeding Straight, CP: Proceeding Straight, Work Zone: NoNarrative generated\.A self\-driving vehicle \(“Ego AV”\) was proceeding straight through an intersection at a steady speed\. A crash partner vehicle was also proceeding straight on the crossing street and approached the same intersection\. As the Ego AV entered the intersection, the crash partner vehicle accelerated into the intersection and then braked late, reducing the available gap\. The Ego AV applied braking to maintain a safe stopping distance, but the vehicles entered the conflict point at the same time and collided in the intersection \(side\-impact angle\), with neither vehicle striking the other from behind and without a head\-on approach\.

Cluster\-18\.Road: Intersection, SV: Making Left Turn, CP: Proceeding Straight, Work Zone: NoNarrative generated\.On Tuesday, March \[XXX\], 2025 at 6:12 PM local time, the ego vehicle was traveling straight through a signalized four\-way intersection in the right through lane at approximately 23 mph with a green indication\. A passenger car on the cross street approached the intersection from the ego vehicle’s right side, slowed, then initiated a left turn to enter the ego vehicle’s roadway \(turning to travel in the same direction as the ego vehicle\)\. As the car turned, it accelerated into the ego vehicle’s lane and occupied the lane space ahead of the ego vehicle before completing the turn\. The ego vehicle applied braking but the available gap was insufficient, and the front of the ego vehicle contacted the car’s right\-rear quarter panel within the intersection\. The vehicles came to rest shortly after the impact; no work zone activity was present\.

Cluster\-19\.Road: Intersection, SV: Making Left Turn, CP:Making Left Turn, Work Zone: NoNarrative generated\.On Tuesday, April \[XXX\], 2025 at 6:12 PM local time, the ego vehicle was proceeding straight through a signalized intersection in the right\-most through lane at approximately 25 mph under a green indication\. A crash partner vehicle approached the same intersection from the cross street and initiated a left turn to enter the ego vehicle’s roadway\. As the crash partner turned, it entered the ego vehicle’s lane while the ego vehicle was already committed to the intersection\. The ego vehicle applied braking and reduced speed but was unable to avoid contact, and the front of the ego vehicle struck the right side of the crash partner within the intersection\. No work zone was present\.

VariablePermissible ValuesDescriptionMap TypeI,T,O,C,SReflects road\-type based on meta variable\.Target Speed \(Self\-Vehicle\)Continuous range \(m/s\)Adopted from narrative and tuned to suit the scenario\.Traffic Density\[0,1\]\[0,1\]Controls density of randomized background traffic\.Crash Partner \(CP\) Specific VariablesLane Offset\{−1,0,\+1\}\\\{\-1,0,\+1\\\}Initialization of crash partner lane relative to ego lane\.Δs\\Delta\_\{s\},Δt\\Delta\_\{t\}Longitudinal / lateral offset \(m\)Relative position of non\-ego crash partner\.PolicySudden,Accelerate\-then\-brake,IDMCP control policy: sudden braking, accelerate\-then\-brake, or IDM\. Includes policy\-specific parameters such as brake timing and magnitude\.Table 5:Scenario variables and permissible values used for LLM\-based parameterization\.
## Appendix DPrompts

### D\.1Paraphrasing agent prompt

Scenario Synthesis from Historical Crash RecordsShown below are failure scenarios from actual historical crash records\. Use these as examples of realistic crashes and propose a new scenario that represents a scenario with the following attributes:Required attributes•Pre\-Crash movement \(Crash Partner\)•Pre\-Crash movement \(Self Vehicle\)•Road Type•Work zone•Narrative listsConstraints \(do not generate\)•Do not design scenarios that are rear\-end collisions for the ego vehicle, or where the ego vehicle is hit by a non\-ego vehicle from behind, because we would like to test the ego vehicle’s policy for braking at a safe distance\.•Do not design scenarios with a head\-on collision between ego and non\-ego vehicle coming at each other from opposite directions\.•For example, avoid narratives such as: “An oncoming passenger vehicle in the opposite direction entered the intersection and initiated a left turn across the vehicle’s path, appearing to misjudge the vehicle’s speed and gap\.”Recreation constraints \(capabilities\)•We can control spawn positions of ego and non\-ego vehicles\.•We can scatter cones\.•We can control target speeds of each vehicle\.•We can control movement of non\-ego vehicle assudden\_brake,accel\_then\_brake, or nominal path following\.•Design a scenario that can be recreated using these variables; do not add additional details\.•At present, sharp changes in trajectory curvature and direction of motion of vehicles cannot be adjusted\.Output format: Return a natural language output \(text/str\) that mimics a narrative\.

### D\.2Scenario design agent prompt

MetaDrive Scenario JSON Generation PromptGoal:Generate a MetaDrive scenario JSON for potential collision/failure cases involving an ego vehicle \(IDM policy\) and crash partner\(s\) \(CP/NCP\)\.Constraint:Ego uses IDM \(collision\-avoidant\)\. Create simple but realistic failures observable under IDM \(e\.g\., early stopping, phantom braking, oscillation\)\. Avoid naive crashes\.Crash Scenario Desired:\{LLM\-generated Narrative\}Reference JSON Examples:\{examples\}Map Types:’X’\(intersection\),’O’\(circle\),’T’\(3\-way\),’S’\(straight\),’C’\(curved\)\. Choose based on narrative\.Turn Specification:ego\_turn: ’left’\|’right’\|’straight’lead\_turn: ’left’\|’right’\|’straight’Each NCP must include:•id•direction: "same"\|"opposite"•lane\_offset: 0,±1\\pm 1•delta\_s\_m\(longitudinal; negative = behind\)•delta\_t\_m\(lateral\)•policy: "sudden\_brake"\|"accel\_then\_brake"\|"idm\_nominal"•params\(policy\-specific dict\)Policy Params:sudden\_brake:\{target\_speed, brake\_step, brake\_steps, speed\_kp\(optional\)\}accel\_then\_brake:\{accel\_mag, accel\_steps, cruise\_speed, brake\_step, brake\_steps, speed\_kp\(optional\)\}idm\_nominal:\{target\_speed\_mps\}Static Obstacles:Only include cones if work zone = Yes or narrative specifies obstacles\.Validity:Follow example JSON format exactly\. Avoid incomplete dicts or invalid lane configurations \(e\.g\., navigation errors\)\.Figure 6:Prompt used by scenario generation agent for actual schema generation\.
### D\.3Fine tuning agent prompt

Scenario Parameter Optimization and Fine\-TuningYou are being asked to optimize the scenario parameters\. Analyze the scenario dict, recreate the scenario spatially, and think what changes can be made to parameters such as speed, positioning, etc\. to induce a collision/crash/accident/unexpected behavior such as abrupt braking, oscillation, etc\.Make sure that tweaking parameters does not violate the narrative provided, as the spatial arrangement and scene playout must follow the narrative\.Inputs provided•Original scenario dict•Narrative•Metrics for the scenario dict: Minimum Euclidean gap, Time at minimum Euclidean gapHow to use the metrics•Use these metrics to adjust scenario parameters, e\.g\., changing speeds, longitudinal gap, or lateral gap at initialization\.•Minimum Euclidean gap must be less than 2 meters to be considered a collision\.•Use the time of minimum gap to determine if it is consistent with the narrative \(e\.g\., the narrative suggests collision much sooner than the current scene\)\.Avoid degenerate solutions•Avoid tweaking quantities such that the minimum gap occurs naively at the beginning or end of the trajectory\.•For example, avoid collisions at timestep 0 or 1 \(very early\) or timestep 359 or 412 \(very late\), i\.e\., at the very beginning or end of the trajectory\.Output format•Return a revised scenario in JSON format that only performs fine\-tuning to the original DSL\.•Only change the original DSL substantially if it is violating the narrative; otherwise stick to numerical fine\-tuning\.

Similar Articles

Revealing Safety-Critical Scenarios for UTM via Transformer

arXiv cs.AI

This research paper proposes a transformer-based reinforcement learning framework to automatically generate safety-critical test scenarios for Unmanned Traffic Management (UTM) systems, achieving an 8× improvement in vulnerability discovery efficiency over expert-guided testing.

AutoMine Solution for AV2 2026 Scenario Mining Challenge

arXiv cs.AI

AutoMine is a robust self-refining scenario mining method using LLMs and VLMs to mine high-value scenarios from autonomous driving logs, achieving top scores in the Argoverse 2 Scenario Mining Competition at CVPR 2026.

Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries

arXiv cs.CL

This paper presents a schema-grounded natural language interface for transportation safety analysis that uses a large language model to interpret user queries while preserving deterministic execution against an authoritative database. The framework is evaluated on a Massachusetts transportation safety database, successfully executing all queries and correcting errors in 29% of cases, demonstrating a practical approach to broadening access to safety data.