Autonomous discovery of traffic laws with AI traffic scientists
Summary
This paper presents TrafficSci, an agentic AI system that automates the discovery of universal traffic laws across cities through iterative workflows, successfully rediscovering established laws and identifying a new temporal memory scale in urban driving behavior.
View Cached Full Text
Cached at: 07/03/26, 05:45 AM
# Autonomous discovery of traffic laws with AI traffic scientists Source: [https://arxiv.org/html/2607.01639](https://arxiv.org/html/2607.01639) \\unnumbered \\equalcont These authors contributed equally to this work\. \\equalcont These authors contributed equally to this work\. \\equalcont These authors contributed equally to this work\. \\equalcont These authors contributed equally to this work\. \[1,4\]\\fnmYisheng\\surLv \[4,5\]\\fnmFei\-Yue\\surWang 1\]State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China 2\]School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China 3\]China Telecom Research Institute, Beijing, 102209, China 4\]Macau Institute of Systems Engineering, Macau University of Science and Technology, Macao 999078, China 5\]State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China \\fnmYue\\surLiu\\fnmXiaoyan\\surGong\\fnmQinghai\\surMiao\\fnmJunyou\\surShang\\fnmYutong\\surWang\\fnmChao\\surGuo\\fnmYonglin\\surTian\\fnmYizhang\\surChai\\fnmChao\\surXiang[yisheng\.lv@ia\.ac\.cn](https://arxiv.org/html/2607.01639v1/mailto:[email protected])[feiyue\.wang@ia\.ac\.cn](https://arxiv.org/html/2607.01639v1/mailto:[email protected])\[\[\[\[\[ ###### Abstract Universal traffic laws describe recurrent patterns in congestion, mobility and driving behavior across cities, providing a scientific basis for transportation planning, management and control\. Their discovery, however, remains expert\-driven, requiring candidate regularities to be identified from heterogeneous observational evidence or validated through intervention experiments\. Although autonomous artificial intelligence \(AI\) systems have advanced scientific discovery in controlled laboratory settings, extending them to complex transportation domains remains a challenge\. Here we present TrafficSci, an agentic AI system that formulates traffic\-law discovery as an iterative, auditable workflow integrating evidence scoping, critic–judge hypothesis induction, and observational–interventional validation\. Across four case studies spanning population, network, control and trajectory scales, TrafficSci autonomously rediscovers three established traffic laws and identifies an unreported intrinsic temporal memory scale in urban driving behavior, statistically consistent across eight cities and two trajectory datasets\. TrafficSci provides a route for extending AI\-driven scientific discovery from controlled domains to complex urban systems\. Urban transportation networks are among the most complex systems that cities should actively manage\. Decisions on congestion mitigation, signal deployment, route guidance and long\-term infrastructure investment all rest, ultimately, on quantitative understanding of how traffic behaves: how congestion costs distribute across a network, how populations allocate trips over space and time, how individual drivers respond to preceding conditions, and how the benefits of a control technology scale with its deployment level\. Such understanding is encoded in traffic laws, concise and testable regularities that recur across operating conditions and, crucially, across cities\[[1](https://arxiv.org/html/2607.01639#bib.bib1),[2](https://arxiv.org/html/2607.01639#bib.bib2),[3](https://arxiv.org/html/2607.01639#bib.bib3),[4](https://arxiv.org/html/2607.01639#bib.bib4),[5](https://arxiv.org/html/2607.01639#bib.bib5)\]\. These laws provide interpretable priors for prediction, principled boundaries for control design, and transferable foundations for urban planning\[[6](https://arxiv.org/html/2607.01639#bib.bib6),[7](https://arxiv.org/html/2607.01639#bib.bib7),[8](https://arxiv.org/html/2607.01639#bib.bib8),[9](https://arxiv.org/html/2607.01639#bib.bib9)\]\. However, the set of established traffic laws remains limited relative to the complexity of urban transportation systems\. Data\-driven prediction and control models have advanced at a remarkable pace\[[10](https://arxiv.org/html/2607.01639#bib.bib10)\], but the interpretable scientific regularities on which robust traffic management ultimately depends have not kept step, leaving a widening gap between the field’s capacity to observe and optimize increasingly complex traffic systems and its capacity to explain why particular interventions work and where their limits lie\. The difficulty of closing this gap is both practical and structural\. A conventional traffic\-law study begins with manual literature synthesis, proceeds through iterative variable and metric design, and demands substantial implementation effort, including data selection, cleaning and bespoke analysis code, before a single hypothesis can be tested\[[11](https://arxiv.org/html/2607.01639#bib.bib11)\]\. Credible validation typically requires checks across multiple datasets, time periods and operating regimes; questions involving management interventions further require controlled simulation experiments\. Results frequently prompt revisions that restart the entire cycle\. These practical burdens are compounded by a fundamental asymmetry with laboratory sciences\. Urban traffic systems are open, non\-stationary and tightly coupled to human behavior; no controlled physical experiment can be arranged to isolate variables and probe candidate laws directly\. Every hypothesis should therefore be corroborated through observational data analysis, computational simulation, or a combination of both\. This dual evidentiary requirement, together with the high manual cost of each experimental iteration, confines the hypothesis space that any individual research group can explore within a realistic time frame and leaves potentially important traffic regularities undiscovered\. Recent advances in large language models \(LLMs\) have opened a route to accelerating scientific discovery by organizing research as executable, verifiable workflows\[[12](https://arxiv.org/html/2607.01639#bib.bib12),[13](https://arxiv.org/html/2607.01639#bib.bib13),[14](https://arxiv.org/html/2607.01639#bib.bib14),[15](https://arxiv.org/html/2607.01639#bib.bib15),[16](https://arxiv.org/html/2607.01639#bib.bib16)\]\. In chemistry, the Coscientist system integrates language models with robotic tools to plan and execute experiments autonomously\[[17](https://arxiv.org/html/2607.01639#bib.bib17)\]; in materials science, A\-Lab couples computational prediction with robotic synthesis to discover and produce novel compounds within days\[[18](https://arxiv.org/html/2607.01639#bib.bib18)\]; in biomedical science, Co\-Scientist further demonstrates the potential of LLM agents in scientific discovery by integrating literature reasoning, hypothesis generation, experimental planning, and iterative refinement into an automated research workflow\[[19](https://arxiv.org/html/2607.01639#bib.bib19)\]\. These successes share a structural prerequisite: a closed loop in which hypothesis generation is tightly coupled to experimental validation\[[20](https://arxiv.org/html/2607.01639#bib.bib20)\]\. In each case, the experimental side of the loop relies on controllable laboratory apparatus\. Whether a comparable loop can be constructed for domains where controlled experiments are often unavailable, and where candidate laws should survive both observational and interventional scrutiny, has not been systematically explored in traffic science\. In this paper, we introduce TrafficSci, an agentic AI system designed to close this discovery loop for urban transportation science\. Given a research topic, TrafficSci retrieves and organizes domain literature through a structured tree\-search mechanism, constructs candidate traffic laws as falsifiable hypotheses anchored to cited evidence, and validates them through an experimentation module that supports both real\-world data analysis and simulation\-based intervention testing\[[21](https://arxiv.org/html/2607.01639#bib.bib21)\]\. When test outcomes fail to corroborate a candidate, the system revises the hypothesis and re\-enters the validation cycle\. The central design principle is that observational corroboration and interventional testing operate within a single iterative loop, addressing the dual evidentiary standard that a field without controllable laboratories inherently requires\. We evaluate TrafficSci through four case studies spanning population\-level mobility and visitation scaling, network\-level congestion dynamics, control\-oriented intervention evaluation, and trajectory\-level temporal regularity in driving\. Across the first three case studies, TrafficSci autonomously rediscovers and empirically verifies the established traffic laws reported in prior work, without manual specification of hypotheses or validation procedures\. Beyond rediscovery, TrafficSci discovers a previously unreported law, a stable temporal memory\-scale regularity in driving behavior that is identified directly from large\-scale vehicle trajectories without predefined hypotheses or analytical templates and remains consistent across cities\. Taken together, these results indicate that traffic\-law discovery can be organized as a repeatable, auditable computational process that recovers established knowledge and identifies regularities missed by conventional research, offering an initial demonstration that AI\-driven scientific discovery can extend from controlled laboratories to complex urban transportation systems\. ## 1Results ### 1\.1System overview Figure 1:The architecture of TrafficSci\. TrafficSci begins with the traffic evidence scoping module, which retrieves relevant literature and constructs an evidence corpus\. The traffic law induction module then formulates structured hypotheses from the retrieved evidence\. These hypotheses are evaluated by the observational–interventional validation module through statistical observation on real\-world traffic data or intervention\-based experiments in simulation environments\. The resulting experimental evidence is fed back to update and refine the hypotheses, forming a closed\-loop process for automated discovery in transportation science\.As shown in Fig\.[1](https://arxiv.org/html/2607.01639#S1.F1), TrafficSci is an autonomous system designed for the discovery and verification of traffic laws, specifically tailored to the unique characteristics of urban transportation research\. It follows an agentic AI\-driven workflow comprising paper retrieval, hypothesis construction, automated experimentation, and feedback\-driven hypothesis refinement within the induction module\. The collaboration between these agents enables TrafficSci to discover and validate traffic laws across population\-level mobility, network\-level congestion, control\-oriented intervention, and trajectory\-level driving behavior\. To make this workflow operational, TrafficSci organizes the discovery process into three interacting functional modules: - •The Traffic Evidence Scoping Moduleuses the Literature\-based Agent Tree Search \(Lit\-LATS\) framework to autonomously retrieve relevant literature based on predefined transportation topics\. It organizes the literature and extracts key information, forming a structured knowledge base that informs hypothesis construction and the discovery of traffic laws\. - •The Traffic Law Induction Modulegenerates structured, testable hypotheses based on the knowledge base from the traffic evidence scoping module\. It defines traffic variables, their relationships, and the conditions under which they apply, providing hypotheses for experimental validation\. - •The Observational–Interventional Validation Moduledesigns and executes experiments for hypotheses generated by the traffic law induction module\. It performs observational validation on real\-world traffic data or interventional validation in simulation environments where traffic conditions can be actively manipulated\. Supported by an extensible transportation database, the module feeds results back to refine, reject, or generalize hypotheses in a closed\-loop framework\. Together, these modules transform a high\-level research topic into empirically tested traffic\-law hypotheses, with their detailed workflows illustrated in Fig\.[2](https://arxiv.org/html/2607.01639#S1.F2)\. Figure 2:Detailed workflows of TrafficSci\.aTraffic evidence scoping module\.Given a research domain prompt, the system performs topic decomposition and applies Lit\-LATS to each sub\-topic for structured literature exploration, including selection, expansion, retrieval, evaluation, and backpropagation, producing a coarse\-grained and ranked literature set\.bTraffic law induction module\.Using the retrieved literature and domain prompt, a generation agent proposes structured candidate hypotheses with explicit evidence anchoring\. A critic–judge loop screens candidate hypotheses in terms of validity, conceptual novelty, significance, and specificity before competitive elimination\. The selected hypothesis is then passed to the validation module together with its evidence set, critique record, and validation\-route tag\.cObservational–interventional validation module\.Refined hypotheses are translated into executable procedures by a procedure agent and empirically tested by an experiment agent via MCP\-based tool interaction \(e\.g\., SUMO, SciPy, and OpenHands\), producing structured experimental results\. ### 1\.2Benchmarking rediscovery of established traffic laws We use three established urban traffic laws as benchmark rediscovery tasks to evaluate whether TrafficSci can reconstruct testable hypotheses and validation procedures from earlier evidence\. Three representative cases are selected to cover different scales of traffic phenomena, including human mobility, congestion dynamics, and control\-induced traffic dynamics\. To ensure that the rediscovery process does not rely on direct access to the target studies, the literature retrieval stage is restricted to papers published before the corresponding reference paper\. Figure 3:Visualization of how TrafficSci discovers the spatio\-temporal law governing visitor volume at urban locations as a joint function of travel distance and visitation frequency\.aUser\-provided prompt specifying the target scientific question\.bLiterature retrieval process for identifying relevant prior studies and empirical evidence\.cConstruction of testable hypotheses from the retrieved literature\.d–gLog–log relationships between visitor volume and the combined distance\-frequency variable in Tokyo, Paris, New York and Beijing\.hSystem\-generated conclusion\. The results demonstrate a consistent inverse\-square scaling relationship across cities\.#### 1\.2\.1Universal visitation law of human mobility across cities Human mobility in cities is characterized by repeated visits to diverse locations, forming rich spatio\-temporal visitation patterns across urban environments\[[22](https://arxiv.org/html/2607.01639#bib.bib22)\]\. A central question is whether visitor volume can be explained by a general law that jointly accounts for travel distance and visitation frequency, rather than by spatial distance alone\. This issue is closely related to the understanding of recurrent population flows, urban interaction intensity, and location demand\. The studyThe Universal Visitation Law of Human Mobility\[[3](https://arxiv.org/html/2607.01639#bib.bib3)\]reported that the number of visitors to a given location scales inversely with the square of the product of travel distance and visitation frequency, revealing a compact spatio\-temporal relation that remains stable across heterogeneous cities\[[23](https://arxiv.org/html/2607.01639#bib.bib23),[24](https://arxiv.org/html/2607.01639#bib.bib24)\]\. To revisit this law, TrafficSci organizes the problem around three core variables—travel distance, visitation frequency, and visitor volume—and links literature\-grounded hypothesis generation with multi\-city empirical validation\. As illustrated in Fig\.[3](https://arxiv.org/html/2607.01639#S1.F3), the system retrieves relevant studies, proposes candidate functional forms, and then tests them on mobility data from Tokyo, Paris, New York and Beijing\. The resulting log–log plots recover the same inverse\-square scaling pattern across all four cities, in agreement with the published universal visitation law\. #### 1\.2\.2Congestion cost distribution in urban and suburban areas based on jam\-prints Figure 4:Visualization of the TrafficSci process for discovering the distribution of traffic congestion costs\.aUser\-provided prompt specifying the target scientific question\.bLiterature retrieval process for identifying relevant prior studies and empirical evidence\.cConstruction of testable hypotheses from the retrieved literature\.dExperimental validation of congestion cost distribution in urban areas\.eExperimental validation of congestion cost distribution in suburban areas\.fVerification of intra\-city periodicity of congestion cost distribution across multiple days\. The results confirm that traffic congestion costs follow a power\-law distribution and exhibit stable temporal regularity within the same city\.Urban congestion strongly affects traffic efficiency and quality of life, and its impact can be characterized through the distribution of congestion costs, such as delay and fuel consumption\[[25](https://arxiv.org/html/2607.01639#bib.bib25)\]\. A natural scientific question is whether these costs follow a reproducible statistical law rather than varying in an arbitrary manner across time and space\. InUnveiling City Jam\-Prints of Urban Traffic Based on Jam Patterns\[[26](https://arxiv.org/html/2607.01639#bib.bib26)\], it was found that congestion cost distributions in both urban and suburban areas exhibit a power\-law form, implying that a small number of jam events account for disproportionately large costs\[[27](https://arxiv.org/html/2607.01639#bib.bib27),[28](https://arxiv.org/html/2607.01639#bib.bib28),[29](https://arxiv.org/html/2607.01639#bib.bib29),[30](https://arxiv.org/html/2607.01639#bib.bib30)\]\. The same study further showed that the corresponding scaling exponents remain similar across days and across the same hours on different days within a city, forming distinctive jam\-print patterns\. TrafficSci revisits this problem by combining literature\-guided hypothesis construction with empirical analysis on urban and suburban traffic data\. As presented in Fig\.[4](https://arxiv.org/html/2607.01639#S1.F4), the system retrieves prior studies on congestion cost patterns, proposes candidate hypotheses about distributional form and temporal variation, and validates them through targeted experiments\. The recovered results confirm the power\-law distribution of congestion costs and further reproduce the periodic intra\-city consistency of the scaling behavior, indicating that recurrent congestion patterns form a stable spatial signature of urban traffic\. Figure 5:Autonomous discovery of the logarithmic relationship between ASC penetration and traffic benefits\.aUser\-provided prompt specifying the target scientific question\.bLiterature retrieval process for identifying relevant prior studies and empirical evidence\.cConstruction of testable hypotheses from the retrieved literature\.dTraffic simulation illustrating the experimental setup and system\-level effects under different ASC penetration rates\.eRelationship between ASC penetration rate and traffic benefits under peak and off\-peak conditions, both exhibiting a logarithmic scaling pattern\. #### 1\.2\.3Logarithmic relationship between adaptive signal control penetration rate and traffic management benefits Adaptive signal control \(ASC\) is widely recognized as an effective strategy for reducing congestion and improving traffic efficiency\[[31](https://arxiv.org/html/2607.01639#bib.bib31),[32](https://arxiv.org/html/2607.01639#bib.bib32)\]\. Yet an important deployment question remains: how do traffic management benefits change as ASC penetrates an urban network? This question matters not only for understanding the scaling behavior of intelligent traffic systems, but also for informing cost\-effective deployment decisions\[[33](https://arxiv.org/html/2607.01639#bib.bib33)\]\. InBig\-data Empowered Traffic Signal Control Could Reduce Urban Carbon Emission\[[34](https://arxiv.org/html/2607.01639#bib.bib34)\], the reported results showed that the penetration–benefit relationship follows a logarithmic trend, with rapid gains at low penetration levels and progressively weaker marginal returns as deployment expands\. TrafficSci examines this problem through simulation\-based interventional validation, where the ASC penetration rate is actively controlled as the intervention variable\. The system reproduces the experiments using the same CBEngine simulation platform and a consistent evaluation protocol\. As shown in Fig\.[5](https://arxiv.org/html/2607.01639#S1.F5), TrafficSci retrieves relevant studies, formulates a diminishing\-return hypothesis, and then conducts controlled simulation experiments by systematically varying the ASC penetration rate\. The resulting changes in travel time, congestion level, traffic efficiency, and carbon\-related indicators are recorded under both peak and off\-peak conditions\. These intervention\-response curves consistently reproduce the logarithmic penetration–benefit pattern reported in the reference study\. Across the three benchmarks, TrafficSci recovered each published law without manually specifying the hypotheses or procedures: the inverse\-square visitation scaling and the power\-law congestion\-cost distribution by observational validation, and the logarithmic relationship between ASC penetration and traffic benefits by controlled intervention\. ### 1\.3Discovery of an intrinsic temporal memory scale in urban driving behavior #### 1\.3\.1Motivation and autonomous hypothesis generation A fundamental question in traffic science is whether microscopic driving behavior contains stable temporal regularities that can be measured and generalized across urban environments\[[35](https://arxiv.org/html/2607.01639#bib.bib35),[36](https://arxiv.org/html/2607.01639#bib.bib36),[37](https://arxiv.org/html/2607.01639#bib.bib37)\]\. Existing trajectory prediction and traffic simulation studies widely use historical motion states as model inputs, implicitly assuming that recent driving history provides useful information for future behavior\. However, the temporal dependence itself is usually treated as a predefined modeling choice, such as a fixed observation window or a manually selected history length, rather than as an object of scientific discovery\. As a result, it remains unclear whether individual driving behavior possesses an intrinsic temporal memory scale, how long the influence of past states persists, and whether this temporal structure is consistent across different cities\. This question is important because the temporal memory of driving behavior reflects how drivers respond to recent motion states, surrounding constraints, and evolving traffic interactions\. If such memory has a stable statistical form, it may provide a microscopic behavioral law that complements macroscopic traffic regularities such as flow\-density relationships and mobility scaling laws\. To investigate this problem in an open\-ended manner, TrafficSci is provided with a high\-level scientific inquiry: whether individual driving behavior exhibits statistically stable temporal dependence across heterogeneous urban environments\. No candidate metric, predefined formula, or expected distribution is specified in advance\. Starting from this inquiry, the traffic evidence scoping module retrieves prior concepts related to temporal dependence, memory effects, correlation decay, behavioral persistence, and human mobility regularities\. These retrieved studies provide conceptual evidence that historical states may influence current behavior, but they do not directly prescribe a specific measurable law for urban driving trajectories\. Based on the retrieved evidence, the traffic law induction module further formulates a testable hypothesis: if urban driving behavior contains an intrinsic temporal structure, then the influence of historical driving states on current states should decay as the temporal lag increases, and the effective decay horizon can be quantified as a temporal memory scale\. This hypothesis transforms the original abstract question into an empirically testable proposition\. In this process, the proposed quantityτ\\tauis not manually imposed by researchers, but emerges from the autonomous workflow of literature\-grounded reasoning, hypothesis generation, and experimental design\. Figure 6:Autonomous discovery of the intrinsic temporal memory scale in urban driving behavior\.aUser\-provided prompt specifying the target scientific question\.bLiterature retrieval process for identifying relevant prior studies and empirical evidence\.cConstruction of testable hypotheses from the retrieved literature\.dPairwise Wasserstein distances between city\-levelτ\\taudistributions\. Darker colors indicate larger distributional distances\.eCity\-level distributions ofτ\\tau, showing similar temporal\-memory patterns across heterogeneous urban environments\.fSystem\-generated conclusion\. The consistently small Wasserstein distances support the cross\-city stability of the discovered memory scale\. #### 1\.3\.2Definition and estimation of temporal memory scale To test the above hypothesis, the abstract concept of temporal dependence is first converted into a measurable quantity\. TrafficSci therefore operationalizes the temporal memory of driving behavior from microscopic vehicle trajectories\. For each vehicle at timett, a driving\-state vector is constructed as 𝐬t=\[xt,yt,vt,θt\]⊤,\\mathbf\{s\}\_\{t\}=\\left\[x\_\{t\},\\;y\_\{t\},\\;v\_\{t\},\\;\\theta\_\{t\}\\right\]^\{\\top\},\(1\)where\(xt,yt\)\(x\_\{t\},y\_\{t\}\)denotes the vehicle position in the ground\-plane coordinate system,vtv\_\{t\}denotes the instantaneous speed, andθt\\theta\_\{t\}denotes the heading angle\. This representation jointly describes spatial displacement, motion intensity, and directional evolution, which are the basic elements of short\-term driving behavior\. Since vehicle positions are naturally continuous over time, the position terms are used to describe the observable kinematic continuity of microscopic trajectories, while the speed and heading terms provide complementary information about motion intensity and directional evolution\. Accordingly, the estimatedτ\\tauis interpreted as a temporal dependence scale of driving\-state evolution, rather than as a direct measure of cognitive driver memory\. Given the state sequence, TrafficSci estimates how strongly historical states remain statistically related to the current driving state over time\. Specifically, for a temporal lagΔ\\Delta, the lag\-dependent historical influence function is defined as I\(Δ\)=14\(ρx\(Δ\)\+ρy\(Δ\)\+ρv\(Δ\)\+ρθ\(Δ\)\),I\(\\Delta\)=\\frac\{1\}\{4\}\\left\(\\rho\_\{x\}\(\\Delta\)\+\\rho\_\{y\}\(\\Delta\)\+\\rho\_\{v\}\(\\Delta\)\+\\rho\_\{\\theta\}\(\\Delta\)\\right\),\(2\)whereρx\(Δ\)\\rho\_\{x\}\(\\Delta\),ρy\(Δ\)\\rho\_\{y\}\(\\Delta\),ρv\(Δ\)\\rho\_\{v\}\(\\Delta\), andρθ\(Δ\)\\rho\_\{\\theta\}\(\\Delta\)denote the lag\-Δ\\Deltaautocorrelation coefficients of the position \(xx,yy\), speed \(vv\), and heading \(θ\\theta\) components\. Intuitively,I\(Δ\)I\(\\Delta\)measures the remaining statistical influence of historical driving states when the time interval from the current state increases\. A larger value ofI\(Δ\)I\(\\Delta\)indicates stronger dependence on the past, whereas a value close to zero suggests that the influence of the corresponding historical state has largely decayed\. Based on this influence function, the temporal memory scale is defined as the earliest lag at which the historical influence becomes sufficiently weak: τ=min\{Δ\>0\|\|I\(Δ\)\|<ϵ\},\\tau=\\min\\left\\\{\\Delta\>0\\;\\middle\|\\;\|I\(\\Delta\)\|<\\epsilon\\right\\\},\(3\)whereϵ=0\.05\\epsilon=0\.05is used as a weak\-correlation threshold to determine when the temporal dependence becomes negligible\. This choice is consistent with the interpretation that correlation magnitudes close to zero indicate negligible association\[[38](https://arxiv.org/html/2607.01639#bib.bib38)\], suggesting that the historical influence has largely decayed\. This threshold serves as an operational criterion for identifying the decay point, and the same value is consistently applied across all cities and datasets to ensure comparable estimates ofτ\\tau\. Under this definition,τ\\taurepresents the effective temporal horizon over which past driving states still exert observable influence on current behavior\. It is worth emphasizing thatτ\\tauis not a manually selected observation length or a hyperparameter of a prediction model\. Instead, it is estimated directly from the empirical temporal correlation structure of real\-world trajectories\. #### 1\.3\.3Cross\-city validation of the discovered law After estimating the temporal memory scaleτ\\tau, TrafficSci further examines whether the discovered quantity reflects a stable behavioral regularity rather than a city\-specific artifact\. To this end, we conduct cross\-city validation on large\-scale vehicle trajectories from six cities in Argoverse 2\[[39](https://arxiv.org/html/2607.01639#bib.bib39)\]and two cities in nuScenes\[[40](https://arxiv.org/html/2607.01639#bib.bib40)\]\. Spanning multiple U\.S\. cities and Singapore, the two datasets differ in collection protocol, temporal resolution, scenario duration, and sensor configuration, and together form a complementary, cross\-source testbed for evaluating whetherτ\\taugeneralizes across heterogeneous urban environments rather than reflecting a single collection pipeline\. Fig\.[6](https://arxiv.org/html/2607.01639#S1.F6)e shows the empirical distributions ofτ\\tauacross different cities\. Although the trajectories are collected from distinct urban environments, the distributions exhibit a highly consistent pattern\. Most samples are concentrated within a short\-to\-moderate temporal range, indicating that recent driving states dominate current behavioral decisions\. Meanwhile, all cities show a visible long\-tail structure, suggesting that a subset of driving behaviors retains longer temporal dependence\. This common distributional shape implies that the temporal memory scale captures a shared microscopic regularity of urban driving behavior\. To quantitatively evaluate the similarity between cities, we compute the normalized first\-order Wasserstein distance between pairwise empirical distributions ofτ\\tau\. As shown in Fig\.[6](https://arxiv.org/html/2607.01639#S1.F6)d, across eight cities and two trajectory datasets, the temporal memory scale ranges from0\.00\.0to4\.04\.0s, with pairwise normalized Wasserstein distances below0\.240\.24and bootstrap95%95\\%confidence intervals remaining below0\.100\.10, indicating limited distributional discrepancy among different urban environments\. This result suggests that the discovered memory scale is not only observable within individual cities, but also exhibits strong cross\-city consistency\. #### 1\.3\.4Scientific significance and potential applications The temporal memory scaleτ\\taureframes the historical context of urban driving behavior from a fixed, manually chosen observation window into a measurable and reproducible quantity, revealing that the dependence of current behavior on the past has a stable temporal horizon that holds across cities\. This carries three practical implications for urban traffic research\. As a diagnostic, the empiricalτ\\taudistribution can test whether a traffic simulator reproduces realistic microscopic temporal dynamics even when it already matches macroscopic statistics such as speed or density\. As a transferable behavioral prior, the cross\-city stability ofτ\\taucan support simulator calibration, domain adaptation, and transfer of traffic models across urban environments\. As a design guideline,τ\\tauoffers an empirical basis for choosing the history length, memory modules, or state\-history features in trajectory prediction and learning\-based driving policies, rather than relying on heuristic windows\. More broadly, this case shows how TrafficSci can move from an open\-ended scientific question to a data\-derived candidate regularity that warrants further validation across broader urban traffic conditions\. ## 2Discussion This work shows that elements of traffic\-law discovery can be organized as a closed\-loop, auditable workflow in urban transportation science\. TrafficSci coordinates multiple LLM agents to perform literature retrieval, hypothesis construction, automated experimentation and hypothesis evolution\. We evaluate TrafficSci through four case studies spanning population\-level visitation scaling\[[3](https://arxiv.org/html/2607.01639#bib.bib3)\], network\-level congestion\[[26](https://arxiv.org/html/2607.01639#bib.bib26)\], infrastructure\-induced intervention effects\[[34](https://arxiv.org/html/2607.01639#bib.bib34)\]and trajectory\-level driving memory\. These studies reveal four traffic regularities across distinct scales, including one previously unreported law that captures an intrinsic temporal memory scale in driving behavior\. These findings provide initial evidence that elements of traffic\-law discovery can be systematized and automated in a reproducible manner\. From a methodological perspective, TrafficSci complements prevailing data\-driven traffic modeling pipelines that prioritize prediction accuracy or control performance\. Rather than treating scientific discovery as a secondary outcome of model fitting, TrafficSci explicitly treats transportation laws as first\-class research objects\. In this sense, TrafficSci functions as an AI traffic scientist that assists human researchers in formulating, testing, and refining traffic laws, thereby providing actionable scientific support for virtual\-real parallel traffic management and control\[[9](https://arxiv.org/html/2607.01639#bib.bib9)\]\. Hypotheses are formulated in interpretable forms, linked to explicit empirical tests, and iteratively refined based on experimental feedback\. This separation between law discovery and model optimization is particularly relevant for urban transportation science, where explanatory regularities often underpin city\-level understanding, planning and policy analysis\. An important implication of this work is that traffic laws need not be restricted to closed\-form mechanistic equations\. In practice, traffic laws frequently manifest as statistical distributions, scaling relations or state\-transition patterns that summarize collective system behavior across conditions\. The evaluation cases illustrate this diversity by covering both network\-level congestion phenomena under infrastructure and demand constraints, and mobility behaviors spanning individual and population scales\. The ability of a single closed\-loop agentic workflow to operate across these levels supports a unified view of transportation law discovery as a structured process that extracts candidate regularities, translates them into testable hypotheses, and validates or revises them through data\-driven experiments\. Several limitations should be noted\. TrafficSci depends on the availability and quality of existing literature and datasets, which may bias hypothesis generation or constrain empirical validation\. In addition, the representativeness of available data limits the assessment of rare events or long\-term structural changes\. Although the closed\-loop workflow reduces manual effort, scientific oversight remains necessary to ensure robustness when scaling automated discovery\. Moreover, the cities and datasets examined here span a limited set of regions, so the global generalizability of the reported cross\-city regularities, and their relevance across diverse urban contexts, remains to be established\. Future work may extend the framework to richer experimental environments and explore more interactive modes of human–AI collaboration\. Overall, this study suggests that agentic scientific discovery offers a promising pathway toward more systematic, scalable and reproducible exploration of traffic laws, supporting the development of intelligent transportation systems and evidence\-based urban planning\. ## 3Methods The overall framework of the TrafficSci system is illustrated in Fig\.[2](https://arxiv.org/html/2607.01639#S1.F2)\. TrafficSci is designed as a closed\-loop pipeline that enables automated discovery of urban transportation science hypotheses through literature retrieval, hypothesis construction, and experimental validation\. TrafficSci is agnostic to the underlying language model and can be instantiated with any mainstream LLM, with all agents in this work implemented using GPT\-5\.5\. The process begins with the traffic evidence scoping module, which systematically searches and filters traffic\-related scientific literature across multiple topics\. The retrieved studies are organized into a structured knowledge base, identifying key variables, empirical patterns, and commonly reported relationships, which provide direct knowledge support for hypothesis construction\. Based on this literature\-grounded knowledge base, the traffic law induction module formulates explicit and testable traffic hypotheses by abstracting relationships among transportation variables and observed phenomena\. These hypotheses are expressed in a structured form, enabling direct translation into experimental procedures\. The observational–interventional validation module validates the generated hypotheses by translating them into executable experimental workflows under two complementary paradigms\. In observational validation, hypotheses are tested through data preprocessing and statistical analysis on real\-world traffic datasets, such as vehicle trajectories, traffic flow records, congestion measurements, and road network information\. In interventional validation, hypotheses are examined in traffic simulation environments such as SUMO, where key traffic conditions or control variables can be actively manipulated\. The resulting experimental evidence is fed back to the hypothesis construction process, allowing hypotheses to be refined, rejected, or generalized based on empirical validation, thereby forming a closed\-loop discovery pipeline\. ### 3\.1Traffic evidence scoping module In this work, the traffic evidence scoping module is primarily composed of a literature retrieval agent\. The agent utilizes an enhanced Lit\-LATS framework, which integrates a query generation mechanism driven by a language model with a topic exploration strategy based on Monte Carlo Tree Search \(MCTS\)\[[41](https://arxiv.org/html/2607.01639#bib.bib41),[42](https://arxiv.org/html/2607.01639#bib.bib42),[43](https://arxiv.org/html/2607.01639#bib.bib43)\]\. Lit\-LATS is specifically designed to efficiently acquire structured knowledge relevant to traffic research topics from the open literature space\. Traffic laws often manifest as statistical distributions, behavioral patterns, or cost variation relationships\[[44](https://arxiv.org/html/2607.01639#bib.bib44),[26](https://arxiv.org/html/2607.01639#bib.bib26),[3](https://arxiv.org/html/2607.01639#bib.bib3)\]\. As such, the associated literature tends to be scattered across various research areas, analysis scales, and application contexts\. Moreover, the same traffic phenomenon may be described using different terminology and expressions in different papers\. Traditional keyword\-based retrieval methods are often inadequate in covering the diverse themes, scales, and heterogeneous expressions in such literature, leading to missed or fragmented clues about traffic laws\. By incorporating a topic\-exploration\-based tree search mechanism, our traffic evidence scoping module dynamically expands the research topics related to traffic laws during the retrieval process\. The system explores and selects among different topic branches, thereby ensuring more comprehensive coverage of the literature space required for traffic law discovery\. This process provides a richer and more coherent knowledge base for subsequent hypothesis generation\. As shown in Fig\.[2](https://arxiv.org/html/2607.01639#S1.F2)a, the system starts by accepting a user\-provided research topic, such as “urban congestion cost distribution law” or “universal law of urban visitation patterns”, and passes it to the topic generation module\. Using predefined LLM prompts, the system automatically generates multiple topic keywords\. These keywords are denoted as P=\{p1,p2,…,pn\}\.P=\\\{p\_\{1\},p\_\{2\},\\ldots,p\_\{n\}\\\}\.\(4\) For each of these keywords, the system expands relevant domain\-specific search terms to form a keyword setKi=\{ki1,ki2,…,kim\}K\_\{i\}=\\\{k\_\{i1\},k\_\{i2\},\\ldots,k\_\{im\}\\\}\. The combination of each original keyword and its expanded search terms generates the following query set: C=\{\(pi,kij\)∣pi∈P,kij∈Ki\}\.C=\\\{\(p\_\{i\},k\_\{ij\}\)\\mid p\_\{i\}\\in P,\\ k\_\{ij\}\\in K\_\{i\}\\\}\.\(5\) Each query combination is then used to retrieve related literature via the Semantic Scholar API, generating a candidate literature set\. The abstracts of these documents are processed by the language model to assess whether the document contains quantitative descriptions of the research phenomenon, experimentally verifiable conclusions, or mechanistic explanations\. Based on the relevance and heuristic value of the documents, a scoreQQis assigned to each keyword node, with a higher score indicating greater relevance to the research topic\. During the retrieval process, the system employs the Upper Confidence Bound for Trees \(UCT\) strategy from MCTS to determine whether a keyword node should continue expanding\. For each nodenn, the system maintains its cumulative scoreQ\(n\)Q\(n\), which reflects the semantic relevance and heuristic value of the documents associated with that node\. It also tracks the number of times the node has been visitedN\(n\)N\(n\)and the number of visits to its parent nodeN\(parent\(n\)\)N\(parent\(n\)\)\. The node’s expansion value is determined using the following UCT formula: UCT\(n\)=Q\(n\)\+c⋅lnN\(parent\(n\)\)N\(n\),UCT\(n\)=Q\(n\)\+c\\cdot\\sqrt\{\\frac\{\\ln N\(parent\(n\)\)\}\{N\(n\)\}\},\(6\)whereQ\(n\)Q\(n\)represents the cumulative score of the node, indicating its relevance to the research topic,N\(n\)N\(n\)is the number of visits to the node,N\(parent\(n\)\)N\(parent\(n\)\)is the number of visits to the parent node, andccis the exploration coefficient, which balances exploration and exploitation\. Through this process of topic keyword generation, keyword expansion, semantic retrieval, and the UCT\-based node selection strategy, the system identifies a set of high\-value literature nodes\. These nodes satisfy the following criteria: they exhibit high semantic relevance to the research topic, contain quantitative descriptions, verifiable conclusions, or mechanistic explanations in the abstracts, and have high UCT scores during the expansion process\. To clarify the scope of literature retrieval, the proposed module does not aim to exhaustively cover the entire literature universe, but instead focuses on collecting evidence that is semantically relevant to the input research topic and potentially useful for law discovery, including quantitative descriptions, experimentally verifiable conclusions, and mechanistic explanations\. Such a design improves retrieval efficiency, but it may also introduce retrieval bias\. In particular, the topic expansion process and LLM\-guided scoring mechanism may favor certain branches that are more frequently discussed or more easily expressed in the existing literature, while under\-exploring less common but potentially valuable directions\. To mitigate this issue, the system initializes the search from multiple topic keywords, explores diverse branches through the MCTS\-based topic expansion mechanism, and retains literature from different semantic paths rather than relying on a single dominant query formulation\. Therefore, the retrieval module is intended to improve coverage and diversity of candidate evidence, but it does not eliminate retrieval bias completely\. The resulting literature set should be regarded as a structured and topic\-oriented evidence pool rather than an unbiased sample of the full scientific literature\. The literature identified in this way is then organized into a structured evidence set, which is represented as a JSON\-based literature collectionℒ\\mathcal\{L\}, where each entry contains the document title and the corresponding abstract\. The system extracts relevant information such as document titles, abstracts, research subjects, key mechanisms, and verifiable metrics\. This structured evidence set forms a comprehensive body of literature directly related to the research topic, providing a reliable foundation for the subsequent hypothesis generation and experimental validation modules\. This facilitates the automated research process of literature\-hypothesis\-experiment\. A related methodological concern is whether the generated hypotheses reflect genuine evidence\-guided discovery or merely the recall of patterns already encoded in the language model\[[45](https://arxiv.org/html/2607.01639#bib.bib45),[46](https://arxiv.org/html/2607.01639#bib.bib46),[47](https://arxiv.org/html/2607.01639#bib.bib47)\]\. This concern is relevant to AI for science settings, where well\-known empirical regularities, such as power\-law\-like relationships, may already appear in pretraining corpora\. To reduce such effects, TrafficSci does not rely on the language model to directly output scientific laws from parametric memory\. Instead, the model organizes, summarizes, and recombines evidence retrieved from the literature collectionℒ\\mathcal\{L\}, so that hypotheses are constrained by reported mechanisms, variables, and quantitative observations\. Although this design cannot fully eliminate prior bias, TrafficSci should be understood as an evidence\-guided hypothesis generation framework rather than a strict tabula rasa discovery process\. The subsequent observational–interventional validation module further subjects each hypothesis to empirical testing, distinguishing evidence\-supported hypotheses from unsupported or merely memorized conjectures\. ### 3\.2Traffic law induction module The traffic law induction module aims to generate, screen, refine, and prioritize traffic science hypotheses under explicit theoretical and empirical constraints, transforming free\-form language generation into a structured and auditable hypothesis\-induction process\. Its role is not to provide empirical validation, which is conducted by the observational–interventional validation module, but to prepare candidate hypotheses for validation through evidence anchoring, critic–judge refinement, competitive ranking, and validation\-route tagging\. The detailed architecture of the module is illustrated in Fig\.[2](https://arxiv.org/html/2607.01639#S1.F2)b\. Given the retrieved literature setℒ\\mathcal\{L\}and the target research topicTT, the hypothesis generation agent first extracts recurring traffic variables, reported mechanisms, and candidate relationships from the retrieved abstracts and metadata\. The initial hypothesis generation process is expressed as H0=HGA\(ℒ,T\),H\_\{0\}=\\mathrm\{HGA\}\(\\mathcal\{L\},T\),\(7\)whereHGA\(⋅\)\\mathrm\{HGA\}\(\\cdot\)denotes the hypothesis generation agent andH0H\_\{0\}represents the initial set of candidate hypotheses\. Each candidate hypothesis is required to specify the involved variables, the hypothesized relationship, the applicable traffic context, and the type of evidence needed for validation\. To constrain the refinement process, TrafficSci anchors each candidate hypothesis to a subset of retrieved evidence\. For a hypothesisHH, the system extracts a keyword setWHW\_\{H\}, including variables, mechanisms, and traffic phenomena\. For each documentd∈ℒd\\in\\mathcal\{L\}, a corresponding keyword setWdW\_\{d\}is extracted from the title, abstract, and metadata\. The relevance betweenHHandddis measured by Srel\(H,d\)=\|WH∩Wd\|\|WH∪Wd\|\.S\_\{\\mathrm\{rel\}\}\(H,d\)=\\frac\{\|W\_\{H\}\\cap W\_\{d\}\|\}\{\|W\_\{H\}\\cup W\_\{d\}\|\}\.\(8\)The top\-kkdocuments with the highestSrel\(H,d\)S\_\{\\mathrm\{rel\}\}\(H,d\)values are retained as the hypothesis\-specific evidence setEHE\_\{H\}\. HereSrel\(H,d\)S\_\{\\mathrm\{rel\}\}\(H,d\)is used only for evidence anchoring, not for ranking candidate hypotheses\. The resulting evidence setEHE\_\{H\}provides a traceable basis for checking whether the variables, mechanisms, and boundary conditions used in the hypothesis are supported by retrieved literature\. The refinement process is implemented as an evidence\-grounded critic–judge loop between two specialized agents: a critic agent and a judge agent\. This loop is used as a pre\-validation refinement step rather than as empirical validation of a hypothesis\. The critic agent examines each candidate hypothesisHtH\_\{t\}against its hypothesis\-specific evidence setEHE\_\{H\}\. It identifies unsupported variables, inconsistent traffic mechanisms, missing boundary conditions, and insufficient operationalization for empirical testing\. To reduce subjective or unconstrained critique, substantive objections are required to refer to specific evidence inEHE\_\{H\}, such as retrieved document identifiers, reported variables, or mechanism descriptions\. When a hypothesis contradicts basic traffic principles or cannot be translated into measurable operational variables, the critic agent marks the issue as a fatal flaw; otherwise, it provides targeted revision suggestions\. The judge agent then assigns each candidate hypothesisHtH\_\{t\}a four\-dimensional quality vector, 𝐬\(Ht\)=\(sval,snov,ssig,sspe\),\\mathbf\{s\}\(H\_\{t\}\)=\(s\_\{\\mathrm\{val\}\},s\_\{\\mathrm\{nov\}\},s\_\{\\mathrm\{sig\}\},s\_\{\\mathrm\{spe\}\}\),\(9\)where each component is scored on a discrete scale from 1 to 10\. The four dimensions are defined as follows\.svals\_\{\\mathrm\{val\}\}measures consistency with retrieved evidence and basic traffic constraints\.snovs\_\{\\mathrm\{nov\}\}measures conceptual novelty, namely whether the hypothesis goes beyond a trivial recombination or direct restatement of known variables\.ssigs\_\{\\mathrm\{sig\}\}measures potential relevance for traffic theory, modeling, or management\.sspes\_\{\\mathrm\{spe\}\}measures whether the variables, boundary conditions, and validation requirements are concrete enough for empirical testing\. Based on the critic feedback and judge scores, the hypothesis generation agent iteratively revises the current hypothesis accordingly: Ht\+1=HGA\(Ht,Ct,𝐬\(Ht\)\),H\_\{t\+1\}=\\mathrm\{HGA\}\(H\_\{t\},C\_\{t\},\\mathbf\{s\}\(H\_\{t\}\)\),\(10\)whereCtC\_\{t\}denotes the evidence\-grounded critique at iterationtt\. The revised hypothesis is then re\-evaluated by the critic–judge loop\. This iterative process continues until the hypothesis quality stabilizes, a maximum number of refinement rounds is reached, or the hypothesis satisfies the predefined acceptance criteria for empirical validation\. In this process, the critic agent provides qualitative refinement signals, while the judge agent converts the refined hypothesis into four explicit scores\. Candidate hypotheses are selected through competitive elimination\. In the implemented ranking procedure, each hypothesis is scored by the arithmetic mean of the four judge scores, Scomp\(H\)=14\(sval\+snov\+ssig\+sspe\)\.S\_\{\\mathrm\{comp\}\}\(H\)=\\frac\{1\}\{4\}\\left\(s\_\{\\mathrm\{val\}\}\+s\_\{\\mathrm\{nov\}\}\+s\_\{\\mathrm\{sig\}\}\+s\_\{\\mathrm\{spe\}\}\\right\)\.\(11\)This equal\-weight score provides a transparent screening rule that requires a hypothesis to be plausible, non\-trivial, relevant and specific before it is passed to empirical validation\. We use fixed equal weights to keep the ranking rule inspectable across case studies\. For auditability, the evidence\-anchoring scoresSrel\(H,d\)S\_\{\\mathrm\{rel\}\}\(H,d\)are retained with the selected hypothesis to indicate which retrieved documents support the hypothesis\-specific evidence set\. These relevance scores trace the evidence base of a hypothesis, but they are not used as standalone indicators of scientific novelty and are not included inScomp\(H\)S\_\{\\mathrm\{comp\}\}\(H\)\. For each candidate hypothesis, the evidence anchoring step retains the top\-kkretrieved documents according toSrel\(H,d\)S\_\{\\mathrm\{rel\}\}\(H,d\), wherekkis fixed across case studies\. Before passing the selected hypothesis to the validation module, TrafficSci assigns a validation\-route tag according to the structured content of the hypothesis\. If a hypothesis describes a naturally observed statistical pattern, such as a distributional law, scaling relationship, temporal correlation, spatial heterogeneity, or cross\-city consistency, it is assigned an observational validation tag\. If a hypothesis concerns the effect of a controllable management action, policy variable, control strategy, or counterfactual deployment level, it is assigned an interventional simulation tag\. When both descriptive regularity and controllable intervention are involved, it is assigned a combined validation tag\. This tag does not itself validate the hypothesis; it only specifies the type of empirical workflow that the subsequent validation module should generate\. The selected hypothesis is passed to the observational–interventional validation module together with its evidence set, critic comments, judge scores, revision history, composite score, variables, boundary conditions, and validation\-route tag\. This handoff connects the traffic law induction module with the subsequent empirical validation module\. The induction module determines whether a hypothesis is evidence\-anchored, sufficiently specific, and worth testing; the validation module then executes the corresponding observational or simulation\-based experiment to evaluate empirical support\. ### 3\.3Observational–interventional validation module The observational–interventional validation module consists of a procedure agent and an experiment agent\. This module validates the structured hypotheses generated by the traffic law induction module by converting them into executable experimental workflows, forming an automated loop from hypothesis construction to experimental validation and hypothesis updating\. It autonomously selects experimental methods, generates experimental steps, calls data sources and external tools, and performs validation tasks, thereby reducing human involvement and improving the efficiency of traffic science research\[[48](https://arxiv.org/html/2607.01639#bib.bib48),[49](https://arxiv.org/html/2607.01639#bib.bib49)\]\. This module follows two complementary validation paradigms: observational validation and interventional validation\. Observational validation tests hypotheses using naturally collected real\-world traffic data without actively changing the traffic system\. It is suitable for examining empirical regularities such as distributional patterns, temporal correlations, spatial heterogeneity, and cross\-city consistency\. Interventional validation tests hypotheses by actively manipulating key variables in traffic simulation environments, making it suitable for evaluating control\-related or policy\-related effects that are difficult, costly, or unsafe to test directly in the real world\. To formalize this closed\-loop process, the observational–interventional validation module is represented as an operator: \(Ht\+1,rt\)=𝒜\(Ht\),\(H\_\{t\+1\},\\,r\_\{t\}\)\\;=\\;\\mathcal\{A\}\(H\_\{t\}\),\(12\)whereHtH\_\{t\}denotes the structured hypothesis at iterationtt,rtr\_\{t\}is the structured validation result, including dataset sources, experimental methods, key metrics, visualizations, and validation conclusions, andHt\+1H\_\{t\+1\}is the updated hypothesis obtained by feeding the validation resultrtr\_\{t\}back to the traffic law induction module for refinement, thereby closing the discovery loop\. As illustrated in Fig\.[2](https://arxiv.org/html/2607.01639#S1.F2)c, the procedure agent first receives the structured hypotheses, including variables, relationships, and applicable conditions\. It analyzes the validation requirements and determines whether each hypothesis should be tested through observational analysis, interventional simulation, or a combination of both\. For example, a hypothesis about a heavy\-tailed congestion cost distribution can be mapped to an observational workflow involving data selection, distribution fitting, and scaling exponent estimation, whereas a hypothesis about traffic control penetration can be mapped to an interventional workflow involving simulation scenario construction, controlled variable manipulation, and intervention analysis\. The experiment agent then converts the procedure plan into executable experimental steps, including data selection, preprocessing, method selection, metric design, tool invocation, and result organization\. For observational validation, relevant datasets are retrieved from internal repositories or external data sources, followed by targeted preprocessing such as missing\-value imputation, denoising, temporal aggregation, and variable extraction\. For interventional validation, simulation scenarios are constructed in traffic simulators such as SUMO, where traffic demand, control strategies, penetration rates, or other key variables can be actively manipulated\. The system then records changes in traffic indicators, such as travel time, congestion level, traffic efficiency, and emissions, to evaluate whether the hypothesis is supported\. The system supports various validation tools, including statistical analysis, time\-series analysis, regression analysis, distribution fitting, and simulation\-based evaluation\[[50](https://arxiv.org/html/2607.01639#bib.bib50)\]\. Since traffic hypotheses differ substantially in data form, spatio\-temporal scale, and evaluation criteria, a fixed experimental template is insufficient\. To address this challenge, the procedure agent dynamically decomposes each hypothesis into validation objectives, required variables, applicable datasets, and candidate methods, while the experiment agent executes the corresponding workflow with suitable tools and metrics\. To integrate heterogeneous validation tools and traffic\-specific environments, the model context protocol \(MCP\) is adopted as a standardized interface for tool integration, enabling access to external environments and services through a unified protocol\. Built upon MCP, a modular skills framework is developed to encapsulate reusable traffic validation workflows, such as distribution fitting, scaling\-law estimation, temporal correlation analysis, regression testing, scenario generation, and SUMO\-based intervention evaluation\. These skills can be composed into end\-to\-end executable pipelines for different types of traffic hypotheses\. Finally, the experiment agent generates a structured validation report, including dataset sources, preprocessing operations, experimental methods, parameter settings, quantitative metrics, visualizations, and validation conclusions\. These results are fed back to the traffic law induction module, allowing each hypothesis to be refined, rejected, or generalized based on empirical evidence, thereby closing the discovery loop\. This design enables TrafficSci to combine real\-world observational evidence with simulation\-based interventional evidence, providing a traffic\-specific experimental foundation for automated discovery and verification of traffic laws\. ## Acknowledgements This work was supported in part by the National Natural Science Foundation of China under Grants 62271485 and 62303462, and in part by the Science and Technology Development Fund, Macao Special Administrative Region under Grants 0093/2023/R1A2, 0145/2023/R1A3 and 0157/2024/R1A2\. During manuscript preparation, the authors used ChatGPT and Claude for language polishing\. The authors reviewed and revised all outputs and take full responsibility for the final manuscript\. ## Data availability The mobility data used in the universal visitation law analysis across four cities are publicly available at https://github\.com/leiii/VisitationLaw\. The data used for the congestion\-cost analysis based on jam\-prints are publicly available at https://github\.com/GuanwenZeng/Jam\-prints\. The data and simulation settings used for the adaptive\-signal\-control analysis are available within the original study cited in the Article and its Zenodo repository at https://doi\.org/10\.5281/zenodo\.14591154\. The trajectory data used for the temporal\-memory analysis are publicly available from Argoverse 2 at https://www\.argoverse\.org/av2\.html and nuScenes at https://www\.nuscenes\.org/nuscenes under their respective terms of use\. ## Code availability The source code implementing TrafficSci and reproducing the analyses in this study will be made publicly available upon acceptance of the manuscript\. ## Author contributions Y\.Lv, X\.D\. and F\.\-Y\.W\. designed the research; X\.D\., Y\.Liu, X\.G\. and Q\.M\. performed the research; Y\.Liu, J\.S\. and Y\.C\. analyzed the data; X\.D\., Y\.Liu, X\.G\., Q\.M\., J\.S\., Y\.W\., C\.G\., Y\.T\., Y\.C\., C\.X\., Y\.Lv and F\.\-Y\.W\. wrote the paper\. ## Competing interests The authors declare no competing interests\. ## References - \\bibcommenthead - \[1\]Duan, J\.*et al\.*Spatiotemporal dynamics of traffic bottlenecks yields an early signal of heavy congestions\.*Nature Communications*14, 8002 \(2023\)\. - \[2\]Saberi, M\.*et al\.*A simple contagion process describes spreading of traffic jams in urban networks\.*Nature Communications*11, 1616 \(2020\)\. - \[3\]Schläpfer, M\.*et al\.*The universal visitation law of human mobility\.*Nature*593, 522–527 \(2021\)\. - \[4\]Cabanas\-Tirapu, O\., Danús, L\., Moro, E\., Sales\-Pardo, M\. & Guimerà, R\.Human mobility is well described by closed\-form gravity\-like models learned automatically from data\.*Nature Communications*16, 1336 \(2025\)\. - \[5\]Vazifeh, M\. M\., Santi, P\., Resta, G\., Strogatz, S\. H\. & Ratti, C\.Addressing the minimum fleet problem in on\-demand urban mobility\.*Nature*557, 534–538 \(2018\)\. - \[6\]Hamedmoghadam, H\., Jalili, M\., Vu, H\. L\. & Stone, L\.Percolation of heterogeneous flows uncovers the bottlenecks of infrastructure networks\.*Nature Communications*12, 1254 \(2021\)\. - \[7\]Lv, Y\., Zhang, X\., Kang, W\. & Duan, Y\.Managing emergency traffic evacuation with a partially random destination allocation strategy: A computational\-experiment\-based optimization approach\.*IEEE Transactions on Intelligent Transportation Systems*16, 2182–2191 \(2015\)\. - \[8\]Olmos, L\. E\., Colak, S\., Shafiei, S\., Saberi, M\. & Gonzalez, M\. C\.Macroscopic dynamics and the collapse of urban traffic\.*Proceedings of the National Academy of Sciences of the United States of America*115, 12654–12661 \(2018\)\. - \[9\]Wang, F\.\-Y\.Parallel control and management for intelligent transportation systems: Concepts, architectures, and applications\.*IEEE Transactions on Intelligent Transportation Systems*11, 630–638 \(2010\)\. - \[10\]Lv, Y\., Duan, Y\., Kang, W\., Li, Z\. & Wang, F\.\-Y\.Traffic flow prediction with big data: A deep learning approach\.*IEEE Transactions on Intelligent Transportation Systems*16, 865–873 \(2014\)\. - \[11\]Udrescu, S\.\-M\. & Tegmark, M\.AI Feynman: A physics\-inspired method for symbolic regression\.*Science Advances*6, eaay2631 \(2020\)\. - \[12\]Wang, H\.*et al\.*Scientific discovery in the age of artificial intelligence\.*Nature*620, 47–60 \(2023\)\. - \[13\]Fu, X\., Li, C\., Quan, S\. J\., Yigitcanlar, T\. & Wasserman, D\.Large language models in urban planning\.*Nature Cities*2, 585–592 \(2025\)\. - \[14\]Zhang, Y\.*et al\.*Exploring the role of large language models in the scientific method: from hypothesis to discovery\.*npj Artificial Intelligence*1, 14 \(2025\)\. - \[15\]Chen, J\.*et al\.*Navigating phase diagram complexity to guide robotic inorganic materials synthesis\.*Nature Synthesis*3, 606–614 \(2024\)\. - \[16\]Lu, C\.*et al\.*Towards end\-to\-end automation of AI research\.*Nature*651, 914–919 \(2026\)\. - \[17\]Boiko, D\. A\., MacKnight, R\., Kline, B\. & Gomes, G\.Autonomous chemical research with large language models\.*Nature*624, 570–578 \(2023\)\. - \[18\]Szymanski, N\. J\.*et al\.*An autonomous laboratory for the accelerated synthesis of novel materials\.*Nature*624, 86–91 \(2023\)\. - \[19\]Gottweis, J\., Weng, W\.\-H\., Daryin, A\.*et al\.*Accelerating scientific discovery with Co\-Scientist\.*Nature*\(2026\)\. - \[20\]Nie, T\., Sun, J\. & Ma, W\.Exploring the roles of large language models in reshaping transportation systems: A survey, framework, and roadmap\.*Artificial Intelligence for Transportation*1, 100003 \(2025\)\. - \[21\]Romera\-Paredes, B\.*et al\.*Mathematical discoveries from program search with large language models\.*Nature*625, 468–475 \(2024\)\. - \[22\]Tachet, R\.*et al\.*Scaling law of urban ride sharing\.*Scientific Reports*7, 42868 \(2017\)\. - \[23\]Song, C\., Qu, Z\., Blumm, N\. & Barabási, A\.\-L\.Limits of predictability in human mobility\.*Science*327, 1018–1021 \(2010\)\. - \[24\]Zhong, L\., Dong, L\., Wang, Q\. R\., Song, C\. & Gao, J\.Universal expansion of human mobility across urban scales\.*Nature Cities*2, 603–607 \(2025\)\. - \[25\]Çolak, S\., Lima, A\. & González, M\. C\.Understanding congested travel in urban areas\.*Nature Communications*7, 10793 \(2016\)\. - \[26\]Zeng, G\.*et al\.*Unveiling city jam\-prints of urban traffic based on jam patterns\.*Communications Physics*8, 121 \(2025\)\. - \[27\]Taillanter, E\. & Barthelemy, M\.Empirical evidence for a jamming transition in urban traffic\.*Journal of the Royal Society Interface*18, 20210391 \(2021\)\. - \[28\]Louf, R\. & Barthelemy, M\.How congestion shapes cities: from mobility patterns to scaling\.*Scientific Reports*4, 5561 \(2014\)\. - \[29\]Arora, N\.*et al\.*Urban congestion relief experiments through routing\-app interventions\.*Nature Cities*\(2026\)\.Advance online publication\. - \[30\]Gao, J\., Barzel, B\. & Barabási, A\.\-L\.Universal resilience patterns in complex networks\.*Nature*530, 307–312 \(2016\)\. - \[31\]Li, M\., Pan, X\., Liu, C\. & Li, Z\.Federated deep reinforcement learning\-based urban traffic signal optimal control\.*Scientific Reports*15, 11724 \(2025\)\. - \[32\]Wang, K\., Shen, Z\., Lei, Z\., Liu, X\. & Zhang, T\.Towards multi\-agent reinforcement learning based traffic signal control through spatio\-temporal hypergraphs\.*IEEE Transactions on Mobile Computing*\(2025\)\. - \[33\]Wu, C\., Kreidieh, A\. R\., Parvate, K\., Vinitsky, E\. & Bayen, A\. M\.Flow: A modular learning framework for mixed autonomy traffic\.*IEEE Transactions on Robotics*38, 1270–1286 \(2021\)\. - \[34\]Wu, K\.*et al\.*Big\-data empowered traffic signal control could reduce urban carbon emission\.*Nature Communications*16, 2013 \(2025\)\. - \[35\]Saifuzzaman, M\. & Zheng, Z\.Incorporating human\-factors in car\-following models: A review of recent developments and research needs\.*Transportation Research Part C: Emerging Technologies*48, 379–403 \(2014\)\. - \[36\]Huang, X\., Sun, J\. & Sun, J\.A car\-following model considering asymmetric driving behavior based on long short\-term memory neural networks\.*Transportation Research Part C: Emerging Technologies*95, 346–362 \(2018\)\. - \[37\]Ma, L\. & Qu, S\.A sequence to sequence learning based car\-following model for multi\-step predictions considering reaction delay\.*Transportation Research Part C: Emerging Technologies*120, 102785 \(2020\)\. - \[38\]Schober, P\., Boer, C\. & Schwarte, L\. A\.Correlation coefficients: appropriate use and interpretation\.*Anesthesia & Analgesia*126, 1763–1768 \(2018\)\. - \[39\]Wilson, B\.*et al\.*Argoverse 2: Next generation datasets for self\-driving perception and forecasting\.*Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks*\(2021\)\. - \[40\]Caesar, H\.*et al\.*nuScenes: A multimodal dataset for autonomous driving\.*Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition*11621–11631 \(2020\)\. - \[41\]Zhou, A\., Yan, K\., Shlapentokh\-Rothman, M\., Wang, H\. & Wang, Y\.\-X\.Language Agent Tree Search unifies reasoning, acting, and planning in language models\.*Proceedings of the 41st International Conference on Machine Learning*235, 56561–56584 \(2024\)\. - \[42\]Curtarolo, S\.*et al\.*AFLOW: An automatic framework for high\-throughput materials discovery\.*Computational Materials Science*58, 218–226 \(2012\)\. - \[43\]Yao, S\.*et al\.*Tree of thoughts: Deliberate problem solving with large language models\.*Advances in Neural Information Processing Systems*36, 11809–11822 \(2023\)\. - \[44\]Zippelius, A\. & Huang, K\.Density\-wave fronts on the brink of wet granular condensation\.*Scientific Reports*7, 3613 \(2017\)\. - \[45\]Messeri, L\. & Crockett, M\. J\.Artificial intelligence and illusions of understanding in scientific research\.*Nature*627, 49–58 \(2024\)\. - \[46\]Marchetti, A\., Manzi, F\., Riva, G\., Gaggioli, A\. & Massaro, D\.Artificial intelligence and the illusion of understanding: A systematic review of theory of mind and large language models\.*Cyberpsychology, Behavior, and Social Networking*28, 505–514 \(2025\)\. - \[47\]Li, J\.*et al\.*An astronomical question answering dataset for evaluating large language models\.*Scientific Data*12, 447 \(2025\)\. - \[48\]Cai, T\., Wang, X\., Ma, T\., Chen, X\. & Zhou, D\.Large language models as tool makers\.*International Conference on Learning Representations*\(2024\)\. - \[49\]Yang, J\.*et al\.*SWE\-agent: Agent\-computer interfaces enable automated software engineering\.*Advances in Neural Information Processing Systems*37, 50528–50652 \(2024\)\. - \[50\]Schick, T\.*et al\.*Toolformer: Language models can teach themselves to use tools\.*Advances in Neural Information Processing Systems*36, 68539–68551 \(2023\)\.
Similar Articles
AutoSci: A Memory-Centric Agentic System for the Full Scientific Research Lifecycle
AutoSci is a memory-centric agentic system designed to automate the full scientific research lifecycle, from literature understanding to rebuttal, using LLM-based agents with persistent memory and self-evolution capabilities.
Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections
This academic paper introduces an AI-enabled analytics framework using existing CCTV infrastructure to evaluate the impact of soft traffic interventions on vehicle speed and safety at urban intersections.
Experiments in Agentic AI for Science
This paper presents two agentic AI frameworks, DeepTS/DeepCollector and DeepScribe, that automate scientific workflows including time-series data curation and conversion of physics lectures into structured reports, using a hybrid local-cloud architecture with LLMs.
Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries
This paper presents a schema-grounded natural language interface for transportation safety analysis that uses a large language model to interpret user queries while preserving deterministic execution against an authoritative database. The framework is evaluated on a Massachusetts transportation safety database, successfully executing all queries and correcting errors in 29% of cases, demonstrating a practical approach to broadening access to safety data.
Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty
This paper proposes a multi-agent reinforcement learning framework that co-trains an autonomous vehicle and pedestrians with personality-driven jaywalking behavior, achieving a 30% reduction in collisions compared to single-agent approaches and demonstrating more realistic interaction scenarios.