AlphaCrafter: A Full-Stack Multi-Agent Framework for Cross-Sectional Quantitative Trading

arXiv cs.AI Papers

Summary

AlphaCrafter is a full-stack multi-agent framework for cross-sectional quantitative trading that uses specialized agents for factor mining, screening, and trading to adapt to evolving market conditions.

arXiv:2605.05580v1 Announce Type: new Abstract: Financial markets are inherently non-stationary, driven by complex interactions among macroeconomic regimes, microstructural frictions, and behavioral dynamics. Building quantitative strategies that remain profitable demands the continuous coupling of factor discovery, regime-adaptive selection, and risk-constrained execution. Prevailing approaches, however, optimize these components under static or isolated assumptions. Factor mining frameworks typically treat alpha discovery as a one-time search process, implicitly assuming that factor efficacy persists across market regimes. Execution-oriented systems often adopt role-playing agent architectures that simulate anthropomorphic trading committees, introducing behavioral noise rather than systematic rationality. Consequently, a fully automated, rationality-driven framework unifying a coherent quantitative pipeline remains absent. We introduce AlphaCrafter, a full-stack multi-agent framework that closes this gap through a continuously adaptive factor-to-execution pipeline, designed to track and respond to evolving market conditions without manual intervention. AlphaCrafter operates via three specialized agents: a Miner that continuously expands the factor pool via LLM-guided search, a Screener that assesses prevailing market conditions to construct regime-conditioned factor ensembles, and a Trader that translates these ensembles into quantitative strategies under explicit risk constraints. Together, these three agents form a closed-loop cross-sectional trading system that adapts holistically to evolving market dynamics. Extensive experiments on CSI 300 and S&P 500 demonstrate that AlphaCrafter consistently outperforms state-of-the-art baselines in risk-adjusted returns while exhibiting the lowest cross-trial variance, confirming that integrated and adaptive factor-to-execution design yields robust trading performance.
Original Article
View Cached Full Text

Cached at: 05/08/26, 08:27 AM

# AlphaCrafter: A Full-Stack Multi-Agent Framework for Cross-Sectional Quantitative Trading
Source: [https://arxiv.org/html/2605.05580](https://arxiv.org/html/2605.05580)
Yishuo Yuan,Jiayi Sheng,Sirui Zeng,Jiaqi Wang,Jiaheng Liu† Nanjing University liujiaheng@nju\.edu\.cn

††footnotetext:†Corresponding Author\.AbstractFinancial markets are inherently non\-stationary, driven by complex interactions among macroeconomic regimes, microstructural frictions, and behavioral dynamics\. Building quantitative strategies that remain profitable demands the continuous coupling of factor discovery, regime\-adaptive selection, and risk\-constrained execution\. Prevailing approaches, however, optimize these components under static or isolated assumptions\. Factor mining frameworks typically treat alpha discovery as a one\-time search process, implicitly assuming that factor efficacy persists across market regimes\. Execution\-oriented systems often adopt role\-playing agent architectures that simulate anthropomorphic trading committees, introducing behavioral noise rather than systematic rationality\. Consequently, a fully automated, rationality\-driven framework unifying a coherent quantitative pipeline remains absent\. We introduceAlphaCrafter, a full\-stack multi\-agent framework that closes this gap through a continuously adaptive factor\-to\-execution pipeline, designed to track and respond to evolving market conditions without manual intervention\. AlphaCrafter operates via three specialized agents: aMinerthat continuously expands the factor pool via LLM\-guided search, aScreenerthat assesses prevailing market conditions to construct regime\-conditioned factor ensembles, and aTraderthat translates these ensembles into quantitative strategies under explicit risk constraints\. Together, these three agents form a closed\-loop cross\-sectional trading system that adapts holistically to evolving market dynamics\. Extensive experiments on CSI 300 and S&P 500 demonstrate that AlphaCrafter consistently outperforms state\-of\-the\-art baselines in risk\-adjusted returns while exhibiting the lowest cross\-trial variance, confirming that integrated and adaptive factor\-to\-execution design yields robust trading performance\.

## 1Introduction

Financial markets constitute high\-dimensional, nonlinear dynamical systems characterized by heavy tailsMandelbrot \([1997](https://arxiv.org/html/2605.05580#bib.bib1)\), volatility clusteringEngle \([1982](https://arxiv.org/html/2605.05580#bib.bib2)\), and complex cross\-sectional dependenciesDiebold and Yılmaz \([2014](https://arxiv.org/html/2605.05580#bib.bib3)\)\. These properties imply that asset returns are driven jointly by macroeconomic regimes, microstructural frictions, and behavioral feedbackBrock and Hommes \([1997](https://arxiv.org/html/2605.05580#bib.bib4);[1998](https://arxiv.org/html/2605.05580#bib.bib5)\); Kahneman and Tversky \([2013](https://arxiv.org/html/2605.05580#bib.bib6)\)\. A critical consequence of these dynamics is that the predictive content of any fixed signal set erodes as market conditions evolve, yet the dominant paradigm in quantitative investing remains one of static model specification and periodic manual recalibrationCaoet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib8)\); GÂRLEANU and PEDERSEN \([2013](https://arxiv.org/html/2605.05580#bib.bib7)\)\.

Traditional quantitative strategies have evolved from classical factor pricing modelsCarhart \([1997](https://arxiv.org/html/2605.05580#bib.bib10)\); Fama and French \([1993](https://arxiv.org/html/2605.05580#bib.bib9)\)to machine learning approaches such as XGBoostChen and Guestrin \([2016](https://arxiv.org/html/2605.05580#bib.bib11)\)and LightGBMKeet al\.\([2017](https://arxiv.org/html/2605.05580#bib.bib12)\), and more recently to deep sequence models including LSTMVennerødet al\.\([2021](https://arxiv.org/html/2605.05580#bib.bib13)\)and Transformer architecturesVaswaniet al\.\([2017](https://arxiv.org/html/2605.05580#bib.bib14)\)\. Despite their predictive power, these methods operate on a frozen design philosophy: they rely on human\-engineered feature sets, fixed model architectures, and manually tuned hyperparameters that are costly to maintain when market structure shifts\.

The advent of LLMs has opened new possibilities for automating aspects of this pipeline\. Recent agent\-based systems have explored two distinct paths\. One line of work simulates role\-playing trading committees—analysts, risk managers, traders—to aggregate multi\-source information into signalsXiaoet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib16)\); Tianet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib17)\)\. These anthropomorphic designs, however, incur high inference latency and may introduce simulated behavioral biases\. Another line focuses on code\-driven execution, compiling strategic reasoning into lightweight bots for low\-latency deploymentSonget al\.\([2026](https://arxiv.org/html/2605.05580#bib.bib18)\)\. Meanwhile, factor\-centric frameworks leverage LLMs for automated alpha discovery through iterative search and regularizationWanget al\.\([2024c](https://arxiv.org/html/2605.05580#bib.bib22);[2025](https://arxiv.org/html/2605.05580#bib.bib23)\); Liet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib19)\); Tanget al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib20)\)\. Existing solutions either terminate at factor generation or decouple discovery from deployment, leaving a fragmented pipeline where each stage assumes a static environment that the preceding stage cannot influence\.

We introduceAlphaCrafter, a multi\-agent framework that closes this loop by operating as a continuously adaptive cross\-sectional trading system, one that dynamically reconfigures its signal ensemble and portfolio construction as market regimes shift\. AlphaCrafter orchestrates three specialized agents in daily\-frequency rotation, each designed to respond to shifting market conditions without manual intervention\. TheMinercontinuously expands the candidate factor pool through LLM\-guided search, preventing signal decay\. TheScreenerreads prevailing market regimes and dynamically assembles a calibrated factor ensemble tuned to current conditions\. TheTraderthen translates this ensemble into a quantitative strategy subject to explicit portfolio construction and risk constraints, generating executable orders\. This design forms a hypothesis–validation–execution loop that adapts end\-to\-end as markets evolve\.

Our core contributions are:

- •End\-to\-End Adaptive Quantitative Pipeline:AlphaCrafter is the first framework to unify LLM\-driven factor discovery, regime\-sensitive factor selection, and risk\-constrained execution within a single system that adjusts continuously to market dynamics, directly addressing the static fragmentation of existing quantitative workflows\.
- •Regime\-Conditioned Factor Ensemble Construction:Rather than relying on a fixed factor set or monolithic alpha miner, our Screener conditions signal composition on prevailing market states, enabling the system to dynamically reweight its information sources without retraining or human recalibration\.
- •Empirical Robustness Across Markets:Extensive evaluations on CSI 300 and S&P 500 demonstrate that AlphaCrafter consistently outperforms strong baselines in risk\-adjusted returns while exhibiting the lowest cross\-trial variance, indicating reliable adaptation rather than overfitted artifacts\.

## 2AlphaCrafter

In this section, we introduceAlphaCrafter, a multi\-agent framework designed for autonomous quantitative trading\. We formalize the environment, define the agent policies, and present the overall optimization objective\.

![Refer to caption](https://arxiv.org/html/2605.05580v1/figures/main.png)Figure 1:The architecture of AlphaCrafter: The Miner expands alpha diversity, the Screener enforces regime\-aware calibration, and the Trader adaptively optimizes execution strategy\.### 2\.1Environment Formulation

The trading environment is formalized as a tuple

ℰ=\(ℳ,𝒵,Π,𝒯,𝒥\),\\mathcal\{E\}=\(\\mathcal\{M\},\\mathcal\{Z\},\\Pi,\\mathcal\{T\},\\mathcal\{J\}\),\(1\)whereℳ\\mathcal\{M\}denotes the market state space, encompassing both aggregate market conditions and individual asset characteristics\. Formally,ℳ\\mathcal\{M\}contains the universe of tradable assets𝒰⊂ℳ\\mathcal\{U\}\\subset\\mathcal\{M\}, where𝒰\\mathcal\{U\}is a set ofNNindividual entities with cross\-sectional features𝐱i,t\\mathbf\{x\}\_\{i,t\}for assetiiat timett\. Beyond asset\-level data,ℳ\\mathcal\{M\}further includes macroeconomic indicators, market indices, volatility surfaces, sentiment signals from news and social media, and alternative data sources that characterize the broader market environment\.𝒵\\mathcal\{Z\}is the factor library, a dynamic repository of quantitative factors, where each factor is a functionf:𝒰→ℝNf:\\mathcal\{U\}\\to\\mathbb\{R\}^\{N\}mapping historical asset observations to a cross\-sectional signal vector forNNassets\.Π\\Pidefines the space of admissible trading strategies, parameterized by portfolio construction rules and risk constraints\.𝒯=\{1,2,…,T\}\\mathcal\{T\}=\\\{1,2,\\ldots,T\\\}is the discrete set of trading days\.𝒥:Π→ℝ\\mathcal\{J\}:\\Pi\\to\\mathbb\{R\}is an evaluation functional that quantifies strategy performance, incorporating risk\-adjusted return metrics\.

All agents operate with access to a shared memoryℋ\\mathcal\{H\}that serves as a centralized conduit for cross\-agent information flow, accumulating historical observations, factor validation outcomes, and execution feedback\. This common memory is made persistent through periodic summarization, whereby agents propagate performance diagnostics, enforce consistency constraints, and relay conditional guidance for downstream decisions, enabling the system to adapt its behavior collectively as market conditions evolve\.

### 2\.2Miner Agent

The Miner agent is responsible for autonomous factor generation, rigorous validation, and continuous library maintenance\. Its policyPMP\_\{M\}guides an iterative exploration\-exploitation process over theasset universe𝒰\\mathcal\{U\}\. Candidate factors are proposed, evaluated against historical asset data, and selectively integrated into the factor library𝒵\\mathcal\{Z\}\. The agent autonomously terminates exploration when internal criteria on factor quality and library diversity are satisfied\.

The Miner operates on the current factor library𝒵t\\mathcal\{Z\}\_\{t\}and the asset universe𝒰\\mathcal\{U\}, leveraging memoryℋt\\mathcal\{H\}\_\{t\}to avoid redundant exploration and track factor efficacy:

AM​\(𝒵t,𝒰;ℋt\)→PM\(𝒵t\+1,ℋt\+1\)\.A\_\{M\}\(\\mathcal\{Z\}\_\{t\},\\mathcal\{U\};\\mathcal\{H\}\_\{t\}\)\\xrightarrow\{P\_\{M\}\}\(\\mathcal\{Z\}\_\{t\+1\},\\mathcal\{H\}\_\{t\+1\}\)\.\(2\)
The policyPMP\_\{M\}is summarized in Algorithm 1\. It encompasses a generation loop where candidate factors are validated using metrics such as the Information Coefficient \(IC\), IC stability \(Information Ratio\), turnover, and decay profile\. Accepted factors are persisted with metadata, while all validation outcomes are recorded in memory to inform future search\. A subsequent maintenance phase re\-validates existing factors, pruning those that exhibit significant performance decay\.

Algorithm 1 Miner Agent PolicyPMP\_\{M\}1\.𝒵t\+1←𝒵t\\mathcal\{Z\}\_\{t\+1\}\\leftarrow\\mathcal\{Z\}\_\{t\}2\.repeat3\.f←generate​\(𝒰,ℋt\)f\\leftarrow\\text\{generate\}\(\\mathcal\{U\},\\mathcal\{H\}\_\{t\}\)4\.r←validate​\(f,𝒰\)r\\leftarrow\\text\{validate\}\(f,\\mathcal\{U\}\)5\.ifrrmeets acceptance criteriathen6\.𝒵t\+1←𝒵t\+1∪\{f\}\\mathcal\{Z\}\_\{t\+1\}\\leftarrow\\mathcal\{Z\}\_\{t\+1\}\\cup\\\{f\\\}7\.ℋt←update​\(ℋt,f,r,meta=“effective”\)\\mathcal\{H\}\_\{t\}\\leftarrow\\text\{update\}\(\\mathcal\{H\}\_\{t\},f,r,\\text\{meta\}=\\text\{\`\`effective''\}\)8\.else9\.ℋt←update​\(ℋt,f,r,meta=“ineffective”\)\\mathcal\{H\}\_\{t\}\\leftarrow\\text\{update\}\(\\mathcal\{H\}\_\{t\},f,r,\\text\{meta\}=\\text\{\`\`ineffective''\}\)10\.untiltermination\_condition\_met\(𝒵t\+1\\mathcal\{Z\}\_\{t\+1\},ℋt\\mathcal\{H\}\_\{t\}\)11\.for eachf∈𝒵tf\\in\\mathcal\{Z\}\_\{t\}do12\.r′←revalidate​\(f,𝒰\)r^\{\\prime\}\\leftarrow\\text\{revalidate\}\(f,\\mathcal\{U\}\)13\.ifr′r^\{\\prime\}fails retention criteriathen14\.𝒵t\+1←𝒵t\+1∖\{f\}\\mathcal\{Z\}\_\{t\+1\}\\leftarrow\\mathcal\{Z\}\_\{t\+1\}\\setminus\\\{f\\\}15\.ℋt←update​\(ℋt,f,r′,meta=“deprecated”\)\\mathcal\{H\}\_\{t\}\\leftarrow\\text\{update\}\(\\mathcal\{H\}\_\{t\},f,r^\{\\prime\},\\text\{meta\}=\\text\{\`\`deprecated''\}\)16\.end for17\.ℋt\+1←ℋt\\mathcal\{H\}\_\{t\+1\}\\leftarrow\\mathcal\{H\}\_\{t\}18\.return\(𝒵t\+1,ℋt\+1\)\(\\mathcal\{Z\}\_\{t\+1\},\\mathcal\{H\}\_\{t\+1\}\)

### 2\.3Screener Agent

The Screener agent distills a coherent factor ensembleℰt\\mathcal\{E\}\_\{t\}by weaving together signals from the factor library with a nuanced reading of prevailing market conditionsℳ\\mathcal\{M\}\. Its policyPSP\_\{S\}forms a view of the market regimeℛ^t\\hat\{\\mathcal\{R\}\}\_\{t\}by absorbing the aggregate behavior of individual equities and broad indices, filtering price movements, fundamental health, and material financial disclosures into an assessment of trend direction, volatility, and correlation structure\. Building on this regime diagnosis, it then assesses the suitability of each factorf∈𝒵tf\\in\\mathcal\{Z\}\_\{t\}, selectively assembling a diversified subset and assigning directional weights to form an ensembleattunedto the unfolding market dynamics\.

The Screener’s transformation of the factor library and market state into an actionable ensemble and updated memory is given by:

AS​\(𝒵t,ℳ;ℋt\)→PS\(\(ℰt,ℛ^t\),ℋt\+1\)\.A\_\{S\}\(\\mathcal\{Z\}\_\{t\},\\mathcal\{M\};\\mathcal\{H\}\_\{t\}\)\\xrightarrow\{P\_\{S\}\}\(\(\\mathcal\{E\}\_\{t\},\\hat\{\\mathcal\{R\}\}\_\{t\}\),\\mathcal\{H\}\_\{t\+1\}\)\.\(3\)
Algorithm 2 outlines the selection process\. The agent ranks factors by a regime\-conditional suitability score, mitigates concentration risk by considering factor correlations, and outputs a structured ensemble\. The memoryℋ\\mathcal\{H\}is updated with the ensemble composition and regime assessment to provide context for downstream agents\.

Algorithm 2 Screener Agent PolicyPSP\_\{S\}1\.if\|𝒵t\|<min\_factors\_required\|\\mathcal\{Z\}\_\{t\}\|<\\text\{min\\\_factors\\\_required\}then2\.ℰt←∅\\mathcal\{E\}\_\{t\}\\leftarrow\\emptyset3\.ℛ^t←None\\hat\{\\mathcal\{R\}\}\_\{t\}\\leftarrow\\text\{None\}4\.ℋt\+1←update​\(ℋt,∅,meta=“insufficient\_factors”\)\\mathcal\{H\}\_\{t\+1\}\\leftarrow\\text\{update\}\(\\mathcal\{H\}\_\{t\},\\emptyset,\\text\{meta\}=\\text\{\`\`insufficient\\\_factors''\}\)5\.return\(\(∅,None\),ℋt\+1\)\(\(\\emptyset,\\text\{None\}\),\\mathcal\{H\}\_\{t\+1\}\)6\.ℛ^t←assess\_regime​\(ℳ,ℋt\)\\hat\{\\mathcal\{R\}\}\_\{t\}\\leftarrow\\text\{assess\\\_regime\}\(\\mathcal\{M\},\\mathcal\{H\}\_\{t\}\)7\.candidates←∅\\text\{candidates\}\\leftarrow\\emptyset8\.for eachf∈𝒵tf\\in\\mathcal\{Z\}\_\{t\}do9\.s←suitability​\(f,ℳ,ℛ^t,ℋt\)s\\leftarrow\\text\{suitability\}\(f,\\mathcal\{M\},\\hat\{\\mathcal\{R\}\}\_\{t\},\\mathcal\{H\}\_\{t\}\)10\.candidates←candidates∪\{\(f,s\)\}\\text\{candidates\}\\leftarrow\\text\{candidates\}\\cup\\\{\(f,s\)\\\}11\.sortcandidates byssdescending12\.selected←diversify​\(candidates\)\\text\{selected\}\\leftarrow\\text\{diversify\}\(\\text\{candidates\}\)13\.ℰt←∅\\mathcal\{E\}\_\{t\}\\leftarrow\\emptyset14\.for eachf∈selectedf\\in\\text\{selected\}do15\.wf,df←assign\_weight\_and\_direction​\(f,s,ℛ^t\)w\_\{f\},d\_\{f\}\\leftarrow\\text\{assign\\\_weight\\\_and\\\_direction\}\(f,s,\\hat\{\\mathcal\{R\}\}\_\{t\}\)16\.ℰt←ℰt∪\{\(f,wf,df\)\}\\mathcal\{E\}\_\{t\}\\leftarrow\\mathcal\{E\}\_\{t\}\\cup\\\{\(f,w\_\{f\},d\_\{f\}\)\\\}17\.ℋt\+1←update​\(ℋt,ℰt,ℛ^t\)\\mathcal\{H\}\_\{t\+1\}\\leftarrow\\text\{update\}\(\\mathcal\{H\}\_\{t\},\\mathcal\{E\}\_\{t\},\\hat\{\\mathcal\{R\}\}\_\{t\}\)18\.return\(\(ℰt,ℛ^t\),ℋt\+1\)\(\(\\mathcal\{E\}\_\{t\},\\hat\{\\mathcal\{R\}\}\_\{t\}\),\\mathcal\{H\}\_\{t\+1\}\)

### 2\.4Trader Agent

The Trader agent actively composes a trading strategyπt\\pi\_\{t\}byintegratingthe factor ensembleℰt\\mathcal\{E\}\_\{t\}and regime assessmentℛ^t\\hat\{\\mathcal\{R\}\}\_\{t\}with self\-determined constraints and proprietary logic, operating on the asset universe𝒰\\mathcal\{U\}\. Its policyPTP\_\{T\}orchestrates a hyperparameter optimization loop, adaptively exploring configurationsΘ\\Thetaof a reference strategyπref\\pi\_\{\\text\{ref\}\}while injecting auxiliary rules and exposure constraints\. The agent dynamically evaluates candidate strategies via backtesting on historical asset data, selects the configuration that maximizes a risk\-adjusted objective, andautonomouslyexecutes the resulting portfolio on the live assetsi∈𝒰i\\in\\mathcal\{U\}\.

The Trader’s operation is formalized as:

AT​\(ℰt,ℛ^t,𝒰;ℋt\)→PT\(πt,rt,ℋt\+1\),A\_\{T\}\(\\mathcal\{E\}\_\{t\},\\hat\{\\mathcal\{R\}\}\_\{t\},\\mathcal\{U\};\\mathcal\{H\}\_\{t\}\)\\xrightarrow\{P\_\{T\}\}\(\\pi\_\{t\},r\_\{t\},\\mathcal\{H\}\_\{t\+1\}\),\(4\)wherertr\_\{t\}is the realized return from executingπt\\pi\_\{t\}\.

Algorithm 3 describes the strategy search and execution policy\. The reference strategyπref\\pi\_\{\\text\{ref\}\}, detailed in Algorithm 4 of Appendix[A](https://arxiv.org/html/2605.05580#A1), provides a structured portfolio construction mechanism operating on the asset universe𝒰\\mathcal\{U\}\. Specifically, it computes a composite score for each asset by aggregating weighted signals from the generated factor ensemble, then ranks assets to select top long and bottom short candidates\. Position sizing follows a controlled allocation scheme parameterized by gross exposureβ\\betaand net exposure biasγ\\gamma, with rebalancing subject to these exposure constraints\. The Trader’s memory update incorporates execution outcomes from this process, establishing a feedback loop for continuous strategy refinement\.

Algorithm 3 Trader Agent PolicyPTP\_\{T\}1\.ifℰt=∅\\mathcal\{E\}\_\{t\}=\\emptysetthen2\.πt←None\\pi\_\{t\}\\leftarrow\\text\{None\}3\.rt←0r\_\{t\}\\leftarrow 04\.ℋt\+1←update​\(ℋt,∅,meta=“empty\_ensemble\_skipped”\)\\mathcal\{H\}\_\{t\+1\}\\leftarrow\\text\{update\}\(\\mathcal\{H\}\_\{t\},\\emptyset,\\text\{meta\}=\\text\{\`\`empty\\\_ensemble\\\_skipped''\}\)5\.return\(None,0,ℋt\+1\)\(\\text\{None\},0,\\mathcal\{H\}\_\{t\+1\}\)6\.rbest←−∞r\_\{\\text\{best\}\}\\leftarrow\-\\infty7\.πbest←None\\pi\_\{\\text\{best\}\}\\leftarrow\\text\{None\}8\.repeat9\.π​\(Θ\)←construct\_strategy​\(πref,ℰt,ℛ^t,ℋt\)\\pi\(\\Theta\)\\leftarrow\\text\{construct\\\_strategy\}\(\\pi\_\{\\text\{ref\}\},\\mathcal\{E\}\_\{t\},\\hat\{\\mathcal\{R\}\}\_\{t\},\\mathcal\{H\}\_\{t\}\)10\.r←backtest​\(π,𝒰\)r\\leftarrow\\text\{backtest\}\(\\pi,\\mathcal\{U\}\)11\.ifr\>rbestr\>r\_\{\\text\{best\}\}then12\.rbest←rr\_\{\\text\{best\}\}\\leftarrow r13\.πbest←π\\pi\_\{\\text\{best\}\}\\leftarrow\\pi14\.ℋt←update​\(ℋt,π,r,meta=“improved”\)\\mathcal\{H\}\_\{t\}\\leftarrow\\text\{update\}\(\\mathcal\{H\}\_\{t\},\\pi,r,\\text\{meta\}=\\text\{\`\`improved''\}\)15\.else16\.ℋt←update​\(ℋt,π,r,meta=“rejected”\)\\mathcal\{H\}\_\{t\}\\leftarrow\\text\{update\}\(\\mathcal\{H\}\_\{t\},\\pi,r,\\text\{meta\}=\\text\{\`\`rejected''\}\)17\.untilexploration\_terminated\(ℋt\\mathcal\{H\}\_\{t\}\)18\.πt←πbest\\pi\_\{t\}\\leftarrow\\pi\_\{\\text\{best\}\}19\.rt←live\_trading​\(πt,𝒰\)r\_\{t\}\\leftarrow\\text\{live\\\_trading\}\(\\pi\_\{t\},\\mathcal\{U\}\)20\.ℋt\+1←update​\(ℋt,πt,rt,meta=“executed”\)\\mathcal\{H\}\_\{t\+1\}\\leftarrow\\text\{update\}\(\\mathcal\{H\}\_\{t\},\\pi\_\{t\},r\_\{t\},\\text\{meta\}=\\text\{\`\`executed''\}\)21\.return\(πt,rt,ℋt\+1\)\(\\pi\_\{t\},r\_\{t\},\\mathcal\{H\}\_\{t\+1\}\)

### 2\.5Overall System Objective

The coordinated operation of the Miner, Screener, and Trader agents constitutes a closed\-loop adaptive system\.

Formally, at each decision trading daytt, the composite action of the agents yields a strategyπt∈Π\\pi\_\{t\}\\in\\Pi\. The objective is to maximize the evaluation functional𝒥\\mathcal\{J\}applied to the strategy executed in the subsequent period:

maxπt∈Πt⁡𝔼​\[𝒥​\(πt\)\],\\max\_\{\\pi\_\{t\}\\in\\Pi\_\{t\}\}\\mathbb\{E\}\\left\[\\mathcal\{J\}\(\\pi\_\{t\}\)\\right\],\(5\)whereΠt⊆Π\\Pi\_\{t\}\\subseteq\\Piis the subset of admissible strategies realizable given the current factor library𝒵t\\mathcal\{Z\}\_\{t\}, market regimeℛ^t\\hat\{\\mathcal\{R\}\}\_\{t\}, and memoryℋt\\mathcal\{H\}\_\{t\}\. The functional𝒥\\mathcal\{J\}inherently penalizes drawdowns and volatility, aligning with the goal of stable capital appreciation over the trading horizonTT\.

## 3Experiments

### 3\.1Experimental Setup

#### 3\.1\.1Dataset

Our experiments utilize a comprehensive dataset covering both Chinese A\-share market \(CSI 300 constituents\) and U\.S\. stock market \(S&P 500 constituents\)\. The raw data encompasses four categories: \(1\)Price\-volume dataincluding daily OHLCV \(Open, High, Low, Close, Volume\); \(2\)Fundamental indicatorsincluding Price\-to\-Earnings \(PE\), Price\-to\-Sales \(PS\), Price\-to\-Book \(PB\), and Dividend Yield Rate \(DYR\); \(3\)Financial statementsincluding quarterly balance sheets, income statements, and cash flow statements; \(4\)Alternative datacomprising financial news and corporate announcements, including authoritative sources such as the Federal Reserve\. The temporal split of the dataset is detailed in Table[1](https://arxiv.org/html/2605.05580#S3.T1)\. Detailed information regarding data sources and storage formats is provided in Appendix[B\.1](https://arxiv.org/html/2605.05580#A2.SS1)\.

Table 1:Dataset Splits for Training, Validation, Backtesting, and Live Trading
#### 3\.1\.2Metrics

We evaluate all methods using three standard financial metrics\.Annualized Return \(AR\)measures the geometric mean yearly return, reflecting absolute profitability\.Sharpe Ratio \(SR\)Sharpe \([1994](https://arxiv.org/html/2605.05580#bib.bib39)\)quantifies risk\-adjusted performance by dividing excess returns over the risk\-free rate by return volatility\.Maximum Drawdown \(MDD\)captures the largest peak\-to\-trough decline during the evaluation period, indicating downside risk exposure\. For factor\-level evaluation, we additionally reportInformation Coefficient \(IC\)andInformation Coefficient Information Ratio \(ICIR\)Grinold and Kahn \([1999](https://arxiv.org/html/2605.05580#bib.bib40)\), where IC measures the cross\-sectional correlation between factor values and forward returns \(we use next\-day forward returns in experiments\), and ICIR—computed as the mean IC divided by its standard deviation—quantifies the stability of predictive performance over time\. Detailed mathematical formulations for all evaluation metrics are provided in Appendix[B\.2](https://arxiv.org/html/2605.05580#A2.SS2)\.

#### 3\.1\.3Baselines

We compare AlphaCrafter against representative methods spanning five categories:quantitative methods\(MACDAppel \([1979](https://arxiv.org/html/2605.05580#bib.bib42)\)and Grid Trading[1](https://arxiv.org/html/2605.05580#bib.bib51)\),machine learning methods\(LightGBMKeet al\.\([2017](https://arxiv.org/html/2605.05580#bib.bib12)\)and XGBoostChen and Guestrin \([2016](https://arxiv.org/html/2605.05580#bib.bib11)\)trained on technical and fundamental features\),deep learning methods\(LSTMVennerødet al\.\([2021](https://arxiv.org/html/2605.05580#bib.bib13)\), TransformerVaswaniet al\.\([2017](https://arxiv.org/html/2605.05580#bib.bib14)\), and TRALinet al\.\([2021](https://arxiv.org/html/2605.05580#bib.bib43)\)for time\-series forecasting\),traditional trading agent methods\(TradingAgentsXiaoet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib16)\)and TradingGroupTianet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib17)\)with rule\-based coordination\), andquantitative trading agent methods\(RD\-AgentLiet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib19)\)and AlphaAgentTanget al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib20)\), which employ LLMs for factor generation\)\. All baseline implementations follow their original papers’ recommended settings\. All methods operate on the same reference strategyπref\\pi\_\{\\text\{ref\}\}\(detailed in Algorithm 4\) with consistent hyperparameter configurations\. For methods requiring feature preprocessing, input data are standardized using Z\-score normalization\. For LLM\-based agent methods, we conduct experiments using three backbone models:GPT 5\.3 CodexOpenAI \([2026](https://arxiv.org/html/2605.05580#bib.bib45)\),Claude Opus 4\.6Anthropic \([2026](https://arxiv.org/html/2605.05580#bib.bib46)\), andGemini 3\.1 ProGoogle DeepMind \([2026](https://arxiv.org/html/2605.05580#bib.bib47)\)\. The final reported results correspond to the best\-performing backbone model\.

#### 3\.1\.4Settings

Our experiments are conducted under a daily\-frequency cross\-sectional trading framework, with portfolio weights updated at each market close using end\-of\-day data\. The simulated exchange parameters are calibrated to real market conditions\. Trading frictions such as slippage and execution latency are negligible at this horizon, as prior work demonstrates their impact is concentrated in high\-frequency settingsIsaenko \([2023](https://arxiv.org/html/2605.05580#bib.bib58)\); Kearnset al\.\([2010](https://arxiv.org/html/2605.05580#bib.bib59)\)\. For LLM\-based agent methods, each configuration is evaluated over 10 independent trials; reported metrics are averaged over trials falling within the interquartile range to mitigate outlier influence\. To eliminate confounding effects from LLM memory and known market\-beta trendsKonget al\.\([2026](https://arxiv.org/html/2605.05580#bib.bib57)\), we conduct not only standard backtesting but also a dedicated live trading phase, whose evaluation window falls strictly outside the training data cutoff of all backbone models\. Comprehensive experimental details are deferred to Appendix[B\.3](https://arxiv.org/html/2605.05580#A2.SS3), and a quantitative justification for the negligibility of trading frictions is provided in Appendix[B\.4](https://arxiv.org/html/2605.05580#A2.SS4)\.

### 3\.2Main Result

Table[2](https://arxiv.org/html/2605.05580#S3.T2)reports the backtesting and live trading performance of all evaluated methods across the CSI 300 and S&P 500 markets\. The best value in each column is highlighted indark green, the second and third best inlight green, the worst indark orange, and the second and third worst inlight orange\.

The results reveal a stark contrast between backtesting promise and live trading robustness across all baselines, exposing fundamental limitations that AlphaCrafter consistently overcomes\. Traditional strategies exhibit heavy regime dependence: Grid Trading achieves a 20\.68% AR on the S&P 500 in backtesting yet collapses to a−28\.22%\-28\.22\\%loss in live trading, while MACD yields 20\.35% AR on the CSI 300 in backtesting but suffers catastrophic live losses on CSI 300 \(AR−38\.69%\-38\.69\\%, Sharpe−2\.5527\-2\.5527\)\. Deep learning models overfit severely despite strong backtesting results—LSTM attains the highest backtesting AR on both CSI 300 \(22\.93%\) and S&P 500 \(18\.26%\), yet fails to generate positive live returns on either market\. Machine learning methods share similar fragility, with XGBoost achieving a backtesting Sharpe of 1\.3431 on the CSI 300 before posting a−20\.40%\-20\.40\\%live AR\. LLM\-based agents \(TradingAgents, TradingGroup\) also degrade notably under live conditions, indicating potential look\-ahead or memory biases in their decision pipelines\.

In contrast, AlphaCrafter is the only method that delivers consistently positive risk\-adjusted returns across all phases and markets\. On the CSI 300, it achieves a backtesting Sharpe of 1\.5322 and maintains a live AR of 5\.70% with a Sharpe of 0\.7002 and a modest drawdown of−5\.31%\-5\.31\\%\. On the S&P 500, it records a low drawdown in both backtesting \(−7\.86%\-7\.86\\%\) and live trading \(−3\.95%\-3\.95\\%\), with a live AR of 9\.26% and Sharpe of 0\.7212\. This dual\-market robustness, uniquely absent in all baselines, demonstrates AlphaCrafter’s ability to mitigate regime sensitivity and overfitting simultaneously\. A detailed case study in Appendix[C](https://arxiv.org/html/2605.05580#A3)examines the contribution of each AlphaCrafter component, demonstrating how individual modules collectively yield its performance advantages\.

Table 2:Backtesting and Live Trading Performance Comparison on CSI 300 and S&P 500
### 3\.3Stability Study

#### 3\.3\.1Overall Performance Stability

To evaluate the reliability of agent\-based trading methods under stochastic variation, we conduct 10 independent trials for each method and examine the distribution of realized returns\. Figure[2](https://arxiv.org/html/2605.05580#S3.F2)reports the performance distributions across trials on CSI 300 and S&P 500 markets\.

![Refer to caption](https://arxiv.org/html/2605.05580v1/x1.png)\(a\)CSI 300 Market
![Refer to caption](https://arxiv.org/html/2605.05580v1/x2.png)\(b\)S&P 500 Market

Figure 2:Performance distributions of agent methods across independent trials on backtesting\.As shown in Figure[2](https://arxiv.org/html/2605.05580#S3.F2), AlphaCrafter consistently achieves robust return profiles across both markets\. On CSI 300, it exhibits a narrow interquartile range and a stable median, indicating low sensitivity to initialization and environmental stochasticity\. This pattern generalizes to the S&P 500 market, where AlphaCrafter maintains comparable central tendency and dispersion characteristics\. In contrast, several baseline agent methods display wider variance or pronounced negative outliers in certain trials\. These results confirm that AlphaCrafter delivers reliable and reproducible performance under repeated independent evaluation, a desirable property for practical deployment in live trading environments\.

#### 3\.3\.2Model Stability

To assess the robustness of AlphaCrafter with respect to the choice of underlying large language model, we evaluate its backtesting performance under three backbones\. Figure[3](https://arxiv.org/html/2605.05580#S3.F3)presents radar charts comparing key risk\-adjusted metrics across these configurations on the CSI 300 and S&P 500 markets\. For IC and ICIR, the reported values represent the mean over all effective factors discovered during the backtesting period in the experiments included in the statistical analysis\.

![Refer to caption](https://arxiv.org/html/2605.05580v1/x3.png)\(a\)CSI 300 Market
![Refer to caption](https://arxiv.org/html/2605.05580v1/x4.png)\(b\)S&P 500 Market

Figure 3:Backtesting performance comparison of AlphaCrafter instantiated with different backbone LLMs\.As illustrated in Figure[3](https://arxiv.org/html/2605.05580#S3.F3), AlphaCrafter exhibits a consistently stable performance profile across all three backbone models on both markets, confirming that the proposed framework is largely insensitive to the specific choice of underlying LLM\. The radar patterns remain broadly aligned, with minor variations observed across individual metric dimensions\. Notably, the Claude Opus 4\.6 backbone demonstrates robust factor mining capabilities across both the CSI 300 and S&P 500 markets, coupled with proficient code\-level strategy implementation, resulting in a marginally superior overall profile relative to the GPT and Gemini variants\. This stability pattern further generalizes seamlessly from the CSI 300 to the S&P 500 market, underscoring the model\-agnostic design of AlphaCrafter and its reliable transferability across distinct market regimes\.

### 3\.4Alpha Decay Analysis

To evaluate the temporal stability of factor efficacy, we conduct an alpha decay analysis across four consecutive semi\-annual periods from January 2024 to January 2026\. We report the mean, maximum, and minimum Information Coefficient \(IC\) for agent\-based factor mining methods \(RD\-Agent, AlphaAgent, and AlphaCrafter\), alongside two Alpha158\-based baselinesYanget al\.\([2020](https://arxiv.org/html/2605.05580#bib.bib48)\):global top20\(retains the 20 best factors over the entire backtesting horizon\) andperiodic top20\(dynamically re\-selects the top 20 factors within each semi\-annual interval\)\. For AlphaCrafter, we dynamically track the effective factors retained in the factor library𝒵t\\mathcal\{Z\}\_\{t\}, reflecting the system’s adaptive curation process\.

![Refer to caption](https://arxiv.org/html/2605.05580v1/x5.png)\(a\)CSI 300 Market
![Refer to caption](https://arxiv.org/html/2605.05580v1/x6.png)\(b\)S&P 500 Market

Figure 4:IC comparison of different methods across time periods on CSI 300 and S&P 500 markets\.As shown in Figure[4](https://arxiv.org/html/2605.05580#S3.F4), the two Alpha158 baselines exhibit diametrically opposite behaviors:periodic top20maintains consistently high IC across both markets through frequent refreshment, whileglobal top20suffers severe volatility with IC values fluctuating widely and even turning negative, confirming that static factor sets are highly vulnerable to market regime shifts\. Among agent\-based methods, all three approaches—RD\-Agent, AlphaAgent, and AlphaCrafter—sustain stable IC within a band of approximately0\.0150\.015–0\.0250\.025across periods, with AlphaCrafter exhibiting minimal deterioration in later evaluation windows\. These results collectively demonstrate that dynamic factor curation, whether through periodic re\-selection or adaptive library maintenance, is essential for mitigating alpha decay and preserving robust predictive power in evolving markets\.

### 3\.5Ablation Study

To quantify the marginal contribution of each agent, we conduct three ablation experiments by selectively replacing individual components with non\-adaptive alternatives: \(1\)w/o Minerreplaces Miner policyPMP\_\{M\}with the static Alpha158 factor setYanget al\.\([2020](https://arxiv.org/html/2605.05580#bib.bib48)\); \(2\)w/o Screenerreplaces Screener policyPSP\_\{S\}with uniform random sampling and equal weighting; \(3\)w/o Traderreplaces Trader policyPTP\_\{T\}with the fixed reference strategyπref\\pi\_\{\\text\{ref\}\}\. All experiments use Claude Opus 4\.6 as the backbone model\.

Table 3:Ablation Study: Backtesting Performance on CSI 300 and S&P 500Table[3](https://arxiv.org/html/2605.05580#S3.T3)reports the results\. Removing any component degrades performance across all metrics, confirming that each agent contributes meaningfully\. The w/o Miner variant sees AR decline on both CSI 300 and S&P 500, underscoring the value of LLM\-driven factor generation\. The w/o Screener variant incurs the largest MDD increase, highlighting Screener’s pivotal role in risk mitigation\. The w/o Trader variant yields the lowest Sharpe ratios, demonstrating that adaptive execution is essential for risk\-adjusted returns\. The full model consistently achieves the highest AR, Sharpe ratio, and lowest volatility across both universes\.

## 4Related Work

### 4\.1LLM\-Powered Agentic Systems

Recent advances in LLMs have catalyzed the development of agentic systems for autonomous reasoning and decision\-making\. These systems range from predefined workflowsZhugeet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib24)\); Xiet al\.\([2023](https://arxiv.org/html/2605.05580#bib.bib27)\)to fully autonomous agentsWanget al\.\([2024a](https://arxiv.org/html/2605.05580#bib.bib28)\); Zhanget al\.\([2024c](https://arxiv.org/html/2605.05580#bib.bib26)\); Honget al\.\([2024](https://arxiv.org/html/2605.05580#bib.bib25)\)\. A growing line of research focuses on standardizing agent behavior through skill\-based frameworks, where agent capabilities are treated as composable, verifiable skills that can be created, evaluated, and dynamically connectedLianget al\.\([2026](https://arxiv.org/html/2605.05580#bib.bib29)\); Xu and Yan \([2026](https://arxiv.org/html/2605.05580#bib.bib30)\); Liet al\.\([2026](https://arxiv.org/html/2605.05580#bib.bib31)\)\. Our AlphaCrafter aligns with domain\-specific agentic workflows while incorporating hierarchical coordination among specialized agents\.

### 4\.2LLM for Financial Trading

LLMs have been increasingly applied to finance, with early domain\-specific models such as FinGPTYanget al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib32)\)and FinLlamaKonstantinidiset al\.\([2024](https://arxiv.org/html/2605.05580#bib.bib33)\)demonstrating strong performance on financial tasks\. However, these models rely on static training data and lack real\-time market experience, limiting their capacity for live trading optimization\. For financial trading, existing agent architectures include news\-drivenZhanget al\.\([2024b](https://arxiv.org/html/2605.05580#bib.bib34)\); Wanget al\.\([2024b](https://arxiv.org/html/2605.05580#bib.bib35)\); Yuet al\.\([2023](https://arxiv.org/html/2605.05580#bib.bib15)\), reflection\-drivenXing \([2025](https://arxiv.org/html/2605.05580#bib.bib36)\); Zhanget al\.\([2024a](https://arxiv.org/html/2605.05580#bib.bib37)\); Yuet al\.\([2024](https://arxiv.org/html/2605.05580#bib.bib38)\); Xiaoet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib16)\); Tianet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib17)\), and factor optimization frameworksLiet al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib19)\); Tanget al\.\([2025](https://arxiv.org/html/2605.05580#bib.bib20)\)\. While prior work has explored multi\-agent collaboration for factor mining , most systems remain fragmented—separating factor discovery from adaptive execution\. AlphaCrafter bridges this gap by unifying factor generation, regime\-aware selection, and adaptive trading within a cohesive multi\-agent framework\.

## 5Conclusion

In this paper, we presentedAlphaCrafter, a full\-stack autonomous multi\-agent framework that addresses the fragmentation problem in quantitative trading by unifying cross\-sectional factor discovery, regime\-aware ensemble selection, and adaptive execution within a single closed\-loop system\. Through the coordinated operation of three specialized agents—Miner for factor generation, Screener for regime\-aware selection, and Trader for risk\-constrained execution—AlphaCrafter enables continuous hypothesis–validation–execution refinement without human retuning\. Extensive empirical evaluations on CSI 300 and S&P 500 demonstrate that AlphaCrafter consistently outperforms state\-of\-the\-art baselines in risk\-adjusted returns while exhibiting the lowest performance variance across independent trials, confirming that integrated factor discovery and adaptive execution yields robust trading performance across diverse market environments\.

## References

- \[1\]External Links:ISSN 00221082, 15406261,[Link](http://www.jstor.org/stable/3648194)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- Introducing Claude Opus 4\.6\.Note:[https://www\.anthropic\.com/news/claude\-opus\-4\-6](https://www.anthropic.com/news/claude-opus-4-6)Accessed: 2026\-04\-27Cited by:[Appendix D](https://arxiv.org/html/2605.05580#A4.p4.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- G\. Appel \(1979\)The moving average convergence\-divergence trading method\.Signalert Corporation,Great Neck, NY\.Cited by:[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- R\. Aroussi \(2026\)yfinance: download market data from Yahoo\! Finance’s API\.Note:[https://pypi\.org/project/yfinance/](https://pypi.org/project/yfinance/)Accessed: 2026\-04\-01Cited by:[§B\.1\.1](https://arxiv.org/html/2605.05580#A2.SS1.SSS1.p1.1)\.
- BaoStock \(2026\)BaoStock: free china stock market data via python api\.Note:[https://www\.baostock\.com/](https://www.baostock.com/)Accessed: 2026\-04\-01Cited by:[§B\.1\.1](https://arxiv.org/html/2605.05580#A2.SS1.SSS1.p1.1)\.
- W\. A\. Brock and C\. H\. Hommes \(1997\)A rational route to randomness\.Econometrica65\(5\),pp\. 1059–1095\.External Links:ISSN 00129682, 14680262,[Link](http://www.jstor.org/stable/2171879)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p1.1)\.
- W\. A\. Brock and C\. H\. Hommes \(1998\)Heterogeneous beliefs and routes to chaos in a simple asset pricing model\.Journal of Economic Dynamics and Control22\(8\),pp\. 1235–1274\.External Links:ISSN 0165\-1889,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/S0165-1889%2898%2900011-6),[Link](https://www.sciencedirect.com/science/article/pii/S0165188998000116)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p1.1)\.
- B\. Cao, S\. Wang, X\. Lin, X\. Wu, H\. Zhang, L\. M\. Ni, and J\. Guo \(2025\)From deep learning to llms: a survey of ai in quantitative investment\.External Links:2503\.21422,[Link](https://arxiv.org/abs/2503.21422)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p1.1)\.
- M\. M\. Carhart \(1997\)On persistence in mutual fund performance\.The Journal of Finance52\(1\),pp\. 57–82\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1111/j.1540-6261.1997.tb03808.x),[Link](https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-6261.1997.tb03808.x),https://onlinelibrary\.wiley\.com/doi/pdf/10\.1111/j\.1540\-6261\.1997\.tb03808\.xCited by:[§1](https://arxiv.org/html/2605.05580#S1.p2.1)\.
- T\. Chen and C\. Guestrin \(2016\)XGBoost: A scalable tree boosting system\.CoRRabs/1603\.02754\.External Links:[Link](http://arxiv.org/abs/1603.02754),1603\.02754Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p2.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- DeepSeek \(2026\)DeepSeek: advancing state\-of\-the\-art open\-source language models\.Note:[https://www\.deepseek\.com/](https://www.deepseek.com/)Accessed: 2026\-04\-25Cited by:[Appendix D](https://arxiv.org/html/2605.05580#A4.p4.1)\.
- F\. X\. Diebold and K\. Yılmaz \(2014\)On the network topology of variance decompositions: measuring the connectedness of financial firms\.Journal of Econometrics182\(1\),pp\. 119–134\.Note:Causality, Prediction, and Specification Analysis: Recent Advances and Future DirectionsExternal Links:ISSN 0304\-4076,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jeconom.2014.04.012),[Link](https://www.sciencedirect.com/science/article/pii/S0304407614000712)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p1.1)\.
- R\. F\. Engle \(1982\)Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation\.Econometrica50\(4\),pp\. 987–1007\.External Links:ISSN 00129682, 14680262,[Link](http://www.jstor.org/stable/1912773)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p1.1)\.
- E\. F\. Fama and K\. R\. French \(1993\)Common risk factors in the returns on stocks and bonds\.Journal of Financial Economics33\(1\),pp\. 3–56\.External Links:ISSN 0304\-405X,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/0304-405X%2893%2990023-5),[Link](https://www.sciencedirect.com/science/article/pii/0304405X93900235)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p2.1)\.
- N\. GÂRLEANU and L\. H\. PEDERSEN \(2013\)Dynamic trading with predictable returns and transaction costs\.The Journal of Finance68\(6\),pp\. 2309–2340\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1111/jofi.12080),[Link](https://onlinelibrary.wiley.com/doi/abs/10.1111/jofi.12080),https://onlinelibrary\.wiley\.com/doi/pdf/10\.1111/jofi\.12080Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p1.1)\.
- Google DeepMind \(2026\)Gemini 3\.1 Pro model card\.Note:[https://deepmind\.google/models/model\-cards/gemini\-3\-1\-pro/](https://deepmind.google/models/model-cards/gemini-3-1-pro/)Accessed: 2026\-04\-27Cited by:[Appendix D](https://arxiv.org/html/2605.05580#A4.p4.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- R\. C\. Grinold and R\. N\. Kahn \(1999\)Active portfolio management: a quantitative approach for producing superior returns and controlling risk\.2 edition,McGraw\-Hill,New York\.Cited by:[§3\.1\.2](https://arxiv.org/html/2605.05580#S3.SS1.SSS2.p1.1)\.
- S\. Hong, Y\. Lin, B\. Liu, B\. Liu, B\. Wu, C\. Zhang, C\. Wei, D\. Li, J\. Chen, J\. Zhang, J\. Wang, L\. Zhang, L\. Zhang, M\. Yang, M\. Zhuge, T\. Guo, T\. Zhou, W\. Tao, X\. Tang, X\. Lu, X\. Zheng, X\. Liang, Y\. Fei, Y\. Cheng, Z\. Gou, Z\. Xu, and C\. Wu \(2024\)Data interpreter: an llm agent for data science\.External Links:2402\.18679,[Link](https://arxiv.org/abs/2402.18679)Cited by:[§4\.1](https://arxiv.org/html/2605.05580#S4.SS1.p1.1)\.
- S\. Isaenko \(2023\)Transaction costs, frequent trading, and stock prices\.Journal of Financial Markets64,pp\. 100775\.External Links:ISSN 1386\-4181,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.finmar.2022.100775),[Link](https://www.sciencedirect.com/science/article/pii/S1386418122000647)Cited by:[§3\.1\.4](https://arxiv.org/html/2605.05580#S3.SS1.SSS4.p1.1)\.
- D\. Kahneman and A\. Tversky \(2013\)Prospect theory: an analysis of decision under risk\.InHandbook of the Fundamentals of Financial Decision Making,pp\. 99–127\.External Links:[Document](https://dx.doi.org/10.1142/9789814417358%5F0006),[Link](https://www.worldscientific.com/doi/abs/10.1142/9789814417358_0006),https://www\.worldscientific\.com/doi/pdf/10\.1142/9789814417358\_0006Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p1.1)\.
- G\. Ke, Q\. Meng, T\. Finley, T\. Wang, W\. Chen, W\. Ma, Q\. Ye, and T\. Liu \(2017\)LightGBM: a highly efficient gradient boosting decision tree\.InProceedings of the 31st International Conference on Neural Information Processing Systems,NIPS’17,Red Hook, NY, USA,pp\. 3149–3157\.External Links:ISBN 9781510860964Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p2.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- M\. Kearns, A\. Kulesza, and Y\. Nevmyvaka \(2010\)Empirical limitations on high frequency trading profitability\.External Links:1007\.2593,[Link](https://arxiv.org/abs/1007.2593)Cited by:[§3\.1\.4](https://arxiv.org/html/2605.05580#S3.SS1.SSS4.p1.1)\.
- Y\. Kong, H\. Lee, Y\. Hwang, A\. Lopez\-Lira, B\. Levy, D\. Mehta, Q\. Wen, C\. Choi, Y\. Lee, and S\. Zohren \(2026\)Evaluating llms in finance requires explicit bias consideration\.External Links:2602\.14233,[Link](https://arxiv.org/abs/2602.14233)Cited by:[Appendix D](https://arxiv.org/html/2605.05580#A4.p3.1),[§3\.1\.4](https://arxiv.org/html/2605.05580#S3.SS1.SSS4.p1.1)\.
- T\. Konstantinidis, G\. Iacovides, M\. Xu, T\. G\. Constantinides, and D\. Mandic \(2024\)FinLlama: financial sentiment classification for algorithmic trading applications\.External Links:2403\.12285,[Link](https://arxiv.org/abs/2403.12285)Cited by:[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- X\. Li, W\. Chen, Y\. Liu, S\. Zheng, X\. Chen, Y\. He, Y\. Li, B\. You, H\. Shen, J\. Sun, S\. Wang, B\. Li, Q\. Zeng, D\. Wang, X\. Zhao, Y\. Wang, R\. B\. Chaim, Z\. Di, Y\. Gao, J\. He, Y\. He, L\. Jing, L\. Kong, X\. Lan, J\. Li, S\. Li, Y\. Li, Y\. Lin, X\. Liu, X\. Liu, H\. Lyu, Z\. Ma, B\. Wang, R\. Wang, T\. Wang, W\. Ye, Y\. Zhang, H\. Xing, Y\. Xue, S\. Dillmann, and H\. Lee \(2026\)SkillsBench: benchmarking how well agent skills work across diverse tasks\.External Links:2602\.12670,[Link](https://arxiv.org/abs/2602.12670)Cited by:[§4\.1](https://arxiv.org/html/2605.05580#S4.SS1.p1.1)\.
- Y\. Li, X\. Yang, X\. Yang, M\. Xu, X\. Wang, W\. Liu, and J\. Bian \(2025\)R&D\-agent\-quant: a multi\-agent framework for data\-centric factors and model joint optimization\.External Links:2505\.15155,[Link](https://arxiv.org/abs/2505.15155)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p3.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1),[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- Y\. Liang, R\. Zhong, H\. Xu, C\. Jiang, Y\. Zhong, R\. Fang, J\. Gu, S\. Deng, Y\. Yao, M\. Wang, S\. Qiao, X\. Xu, T\. Wu, K\. Wang, Y\. Liu, Z\. Bi, J\. Lou, Y\. E\. Jiang, H\. Zhu, G\. Yu, H\. Hong, L\. Huang, H\. Xue, C\. Wang, Y\. Wang, Z\. Shan, X\. Chen, Z\. Tu, F\. Xiong, X\. Xie, P\. Zhang, Z\. Gui, L\. Liang, J\. Zhou, C\. Wu, J\. Shang, Y\. Gong, J\. Lin, C\. Xu, H\. Deng, W\. Zhang, K\. Ding, Q\. Zhang, F\. Huang, N\. Zhang, J\. Z\. Pan, G\. Qi, H\. Wang, and H\. Chen \(2026\)SkillNet: create, evaluate, and connect ai skills\.External Links:2603\.04448,[Link](https://arxiv.org/abs/2603.04448)Cited by:[§4\.1](https://arxiv.org/html/2605.05580#S4.SS1.p1.1)\.
- H\. Lin, D\. Zhou, W\. Liu, and J\. Bian \(2021\)Learning multiple stock trading patterns with temporal routing adaptor and optimal transport\.InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 1017–1026\.External Links:[Document](https://dx.doi.org/10.1145/3447548.3467358)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- Lixinger \(2026\)Lixinger: data center for rational investors\.Note:[https://www\.lixinger\.com/](https://www.lixinger.com/)Accessed: 2026\-04\-01Cited by:[§B\.1\.1](https://arxiv.org/html/2605.05580#A2.SS1.SSS1.p1.1)\.
- B\. B\. Mandelbrot \(1997\)The variation of certain speculative prices\.InFractals and Scaling in Finance: Discontinuity, Concentration, Risk\. Selecta Volume E,pp\. 371–418\.External Links:ISBN 978\-1\-4757\-2763\-0,[Document](https://dx.doi.org/10.1007/978-1-4757-2763-0%5F14),[Link](https://doi.org/10.1007/978-1-4757-2763-0_14)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p1.1)\.
- Meta \(2026\)Llama: open and efficient foundation language models\.Note:[https://www\.llama\.com/](https://www.llama.com/)Accessed: 2026\-04\-25Cited by:[Appendix D](https://arxiv.org/html/2605.05580#A4.p4.1)\.
- Microsoft Qlib Contributors \(2026\)Qlib documentation: data layer and Alpha158 data handler\.Note:[https://qlib\.readthedocs\.io/en/v0\.6\.2/component/data\.html](https://qlib.readthedocs.io/en/v0.6.2/component/data.html)Accessed: 2026\-04\-27Cited by:[§C\.1](https://arxiv.org/html/2605.05580#A3.SS1.p4.3)\.
- OpenAI \(2026\)Introducing GPT\-5\.3\-Codex\.Note:[https://openai\.com/index/introducing\-gpt\-5\-3\-codex/](https://openai.com/index/introducing-gpt-5-3-codex/)Accessed: 2026\-04\-27Cited by:[Appendix D](https://arxiv.org/html/2605.05580#A4.p4.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- W\. F\. Sharpe \(1994\)The sharpe ratio\.The Journal of Portfolio Management21\(1\),pp\. 49–58\.External Links:[Document](https://dx.doi.org/10.3905/jpm.1994.409501)Cited by:[§3\.1\.2](https://arxiv.org/html/2605.05580#S3.SS1.SSS2.p1.1)\.
- Z\. Song, K\. Song, G\. Hu, D\. Qi, J\. Gao, X\. Wang, D\. Li, and C\. Zhao \(2026\)Trade in minutes\! rationality\-driven agentic system for quantitative financial trading\.External Links:2510\.04787,[Link](https://arxiv.org/abs/2510.04787)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p3.1)\.
- Z\. Tang, Z\. Chen, J\. Yang, J\. Mai, Y\. Zheng, K\. Wang, J\. Chen, and L\. Lin \(2025\)AlphaAgent: llm\-driven alpha mining with regularized exploration to counteract alpha decay\.External Links:2502\.16789,[Link](https://arxiv.org/abs/2502.16789)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p3.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1),[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- F\. Tian, F\. D\. Salim, and H\. Xue \(2025\)TradingGroup: a multi\-agent trading system with self\-reflection and data\-synthesis\.External Links:2508\.17565,[Link](https://arxiv.org/abs/2508.17565)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p3.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1),[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- Trading Economics \(2026\)TRADING economics \| 20 million indicators from 196 countries\.Note:Accessed: 2026\-04\-25External Links:[Link](https://tradingeconomics.com/)Cited by:[§B\.2](https://arxiv.org/html/2605.05580#A2.SS2.p3.4)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, L\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.CoRRabs/1706\.03762\.External Links:[Link](http://arxiv.org/abs/1706.03762),1706\.03762Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p2.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- C\. B\. Vennerød, A\. Kjærran, and E\. S\. Bugge \(2021\)Long short\-term memory RNN\.CoRRabs/2105\.06756\.External Links:[Link](https://arxiv.org/abs/2105.06756),2105\.06756Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p2.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1)\.
- L\. Wang, C\. Ma, X\. Feng, Z\. Zhang, H\. Yang, J\. Zhang, Z\. Chen, J\. Tang, X\. Chen, Y\. Lin, W\. X\. Zhao, Z\. Wei, and J\. Wen \(2024a\)A survey on large language model based autonomous agents\.Frontiers of Computer Science18\(6\)\.External Links:ISSN 2095\-2236,[Link](http://dx.doi.org/10.1007/s11704-024-40231-1),[Document](https://dx.doi.org/10.1007/s11704-024-40231-1)Cited by:[§4\.1](https://arxiv.org/html/2605.05580#S4.SS1.p1.1)\.
- M\. Wang, K\. Izumi, and H\. Sakaji \(2024b\)LLMFactor: extracting profitable factors through prompts for explainable stock movement prediction\.External Links:2406\.10811,[Link](https://arxiv.org/abs/2406.10811)Cited by:[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- S\. Wang, H\. Yuan, L\. M\. Ni, and J\. Guo \(2024c\)QuantAgent: seeking holy grail in trading by self\-improving large language model\.External Links:2402\.03755,[Link](https://arxiv.org/abs/2402.03755)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p3.1)\.
- S\. Wang, H\. Yuan, L\. Zhou, L\. M\. Ni, H\. Shum, and J\. Guo \(2025\)Alpha\-gpt: human\-ai interactive alpha mining for quantitative investment\.External Links:2308\.00016,[Link](https://arxiv.org/abs/2308.00016)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p3.1)\.
- Z\. Xi, W\. Chen, X\. Guo, W\. He, Y\. Ding, B\. Hong, M\. Zhang, J\. Wang, S\. Jin, E\. Zhou, R\. Zheng, X\. Fan, X\. Wang, L\. Xiong, Y\. Zhou, W\. Wang, C\. Jiang, Y\. Zou, X\. Liu, Z\. Yin, S\. Dou, R\. Weng, W\. Cheng, Q\. Zhang, W\. Qin, Y\. Zheng, X\. Qiu, X\. Huang, and T\. Gui \(2023\)The rise and potential of large language model based agents: a survey\.External Links:2309\.07864,[Link](https://arxiv.org/abs/2309.07864)Cited by:[§4\.1](https://arxiv.org/html/2605.05580#S4.SS1.p1.1)\.
- Y\. Xiao, E\. Sun, D\. Luo, and W\. Wang \(2025\)TradingAgents: multi\-agents llm financial trading framework\.External Links:2412\.20138,[Link](https://arxiv.org/abs/2412.20138)Cited by:[§1](https://arxiv.org/html/2605.05580#S1.p3.1),[§3\.1\.3](https://arxiv.org/html/2605.05580#S3.SS1.SSS3.p1.1),[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- F\. Xing \(2025\)Designing heterogeneous llm agents for financial sentiment analysis\.ACM Transactions on Management Information Systems16\(1\),pp\. 1–24\.External Links:ISSN 2158\-6578,[Link](http://dx.doi.org/10.1145/3688399),[Document](https://dx.doi.org/10.1145/3688399)Cited by:[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- R\. Xu and Y\. Yan \(2026\)Agent skills for large language models: architecture, acquisition, security, and the path forward\.External Links:2602\.12430,[Link](https://arxiv.org/abs/2602.12430)Cited by:[§4\.1](https://arxiv.org/html/2605.05580#S4.SS1.p1.1)\.
- H\. Yang, X\. Liu, and C\. D\. Wang \(2025\)FinGPT: open\-source financial large language models\.External Links:2306\.06031,[Link](https://arxiv.org/abs/2306.06031)Cited by:[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- X\. Yang, W\. Liu, D\. Zhou, J\. Bian, and T\. Liu \(2020\)Qlib: an AI\-oriented quantitative investment platform\.arXiv preprint arXiv:2009\.11189\.Cited by:[§3\.4](https://arxiv.org/html/2605.05580#S3.SS4.p1.1),[§3\.5](https://arxiv.org/html/2605.05580#S3.SS5.p1.4)\.
- Y\. Yu, H\. Li, Z\. Chen, Y\. Jiang, Y\. Li, D\. Zhang, R\. Liu, J\. W\. Suchow, and K\. Khashanah \(2023\)FinMem: a performance\-enhanced llm trading agent with layered memory and character design\.External Links:2311\.13743,[Link](https://arxiv.org/abs/2311.13743)Cited by:[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- Y\. Yu, Z\. Yao, H\. Li, Z\. Deng, Y\. Cao, Z\. Chen, J\. W\. Suchow, R\. Liu, Z\. Cui, Z\. Xu, D\. Zhang, K\. Subbalakshmi, G\. Xiong, Y\. He, J\. Huang, D\. Li, and Q\. Xie \(2024\)FinCon: a synthesized llm multi\-agent system with conceptual verbal reinforcement for enhanced financial decision making\.External Links:2407\.06567,[Link](https://arxiv.org/abs/2407.06567)Cited by:[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- C\. Zhang, X\. Liu, Z\. Zhang, M\. Jin, L\. Li, Z\. Wang, W\. Hua, D\. Shu, S\. Zhu, X\. Jin, S\. Li, M\. Du, and Y\. Zhang \(2024a\)When ai meets finance \(stockagent\): large language model\-based stock trading in simulated real\-world environments\.External Links:2407\.18957,[Link](https://arxiv.org/abs/2407.18957)Cited by:[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- H\. Zhang, F\. Hua, C\. Xu, H\. Kong, R\. Zuo, and J\. Guo \(2024b\)Unveiling the potential of sentiment: can large language models predict chinese stock price movements?\.External Links:2306\.14222,[Link](https://arxiv.org/abs/2306.14222)Cited by:[§4\.2](https://arxiv.org/html/2605.05580#S4.SS2.p1.1)\.
- J\. Zhang, C\. Zhao, Y\. Zhao, Z\. Yu, M\. He, and J\. Fan \(2024c\)MobileExperts: a dynamic tool\-enabled agent team in mobile devices\.External Links:2407\.03913,[Link](https://arxiv.org/abs/2407.03913)Cited by:[§4\.1](https://arxiv.org/html/2605.05580#S4.SS1.p1.1)\.
- M\. Zhuge, H\. Liu, F\. Faccio, D\. R\. Ashley, R\. Csordás, A\. Gopalakrishnan, A\. Hamdi, H\. A\. A\. K\. Hammoud, V\. Herrmann, K\. Irie, L\. Kirsch, B\. Li, G\. Li, S\. Liu, J\. Mai, P\. Piękos, A\. A\. Ramesh, I\. Schlag, W\. Shi, A\. Stanić, W\. Wang, Y\. Wang, M\. Xu, D\. Fan, B\. Ghanem, and J\. Schmidhuber \(2025\)Mindstorms in natural language\-based societies of mind\.Computational Visual Media11\(1\),pp\. 29–81\.External Links:ISSN 2096\-0433,[Link](http://dx.doi.org/10.26599/CVM.2025.9450460),[Document](https://dx.doi.org/10.26599/cvm.2025.9450460)Cited by:[§4\.1](https://arxiv.org/html/2605.05580#S4.SS1.p1.1)\.

## Appendix AReference Strategy

Algorithm 4 defines the fixed reference strategyπref\\pi\_\{\\text\{ref\}\}, which serves as the non\-adaptive baseline for the Trader agent\. Given a set of factor signals from the ensembleℰ​t\\mathcal\{E\}t, it computes a composite scoreϕi,t\\phi\_\{i,t\}for each asset as a weighted combination of factor values\. Assets are then ranked by this score, with the topNlongN\_\{\\text\{long\}\}selected for long positions and the bottomNshortN\_\{\\text\{short\}\}for short positions\. Position sizes are determined by a gross exposure parameterβ\\betaand a net exposure biasγ\\gamma, which controls the long–short balance \(e\.g\.,γ=1\\gamma=1yields a long\-only portfolio\)\. The strategy submits orders to adjust holdings from the previous period to the computed target weights\.

Algorithm 4 Reference Strategyπref\\pi\_\{\\text\{ref\}\}Hyperparameters:Θ=\(Nlong,Nshort,β,γ\)\\Theta=\(N\_\{\\text\{long\}\},N\_\{\\text\{short\}\},\\beta,\\gamma\)Nlong,NshortN\_\{\\text\{long\}\},N\_\{\\text\{short\}\}: number of long/short positionsβ\\beta: gross exposure,γ∈\[−1,1\]\\gamma\\in\[\-1,1\]: net exposure bias \(γ=1\\gamma=1for long\-only market\)1\.for eachasseti∈𝒰i\\in\\mathcal\{U\}do2\.ϕi,t←∑j=1\|ℰt\|wj⋅dj⋅fj​\(𝐱i,t\)\\phi\_\{i,t\}\\leftarrow\\sum\_\{j=1\}^\{\|\\mathcal\{E\}\_\{t\}\|\}w\_\{j\}\\cdot d\_\{j\}\\cdot f\_\{j\}\(\\mathbf\{x\}\_\{i,t\}\)3\.end for4\.sort​𝒰​by​ϕi,t​descending\\text\{sort \}\\mathcal\{U\}\\text\{ by \}\\phi\_\{i,t\}\\text\{ descending\}5\.ℐlong←first​Nlong​assets\\mathcal\{I\}\_\{\\text\{long\}\}\\leftarrow\\text\{first \}N\_\{\\text\{long\}\}\\text\{ assets\},ℐshort←last​Nshort​assets\\mathcal\{I\}\_\{\\text\{short\}\}\\leftarrow\\text\{last \}N\_\{\\text\{short\}\}\\text\{ assets\}6\.for eachi∈\{i∣hi,t−1≠0\}i\\in\\\{i\\mid h\_\{i,t\-1\}\\neq 0\\\}do// liquidate positions no longer in the list7\.ifi∉ℐlong∪ℐshorti\\notin\\mathcal\{I\}\_\{\\text\{long\}\}\\cup\\mathcal\{I\}\_\{\\text\{short\}\}then8\.submit\_order​\(i,−hi,t−1\)\\text\{submit\\\_order\}\(i,\-h\_\{i,t\-1\}\)9\.end if10\.end for11\.Vlong←β⋅NAVt⋅\(1\+γ\)/2V\_\{\\text\{long\}\}\\leftarrow\\beta\\cdot\\text\{NAV\}\_\{t\}\\cdot\(1\+\\gamma\)/212\.Vshort←β⋅NAVt⋅\(1−γ\)/2V\_\{\\text\{short\}\}\\leftarrow\\beta\\cdot\\text\{NAV\}\_\{t\}\\cdot\(1\-\\gamma\)/213\.for eachi∈ℐlongi\\in\\mathcal\{I\}\_\{\\text\{long\}\}do14\.hi,ttarget←Vlong/\(Nlong⋅Pi,t\)h\_\{i,t\}^\{\\text\{target\}\}\\leftarrow V\_\{\\text\{long\}\}/\(N\_\{\\text\{long\}\}\\cdot P\_\{i,t\}\)15\.end for16\.for eachi∈ℐshorti\\in\\mathcal\{I\}\_\{\\text\{short\}\}do17\.hi,ttarget←−Vshort/\(Nshort⋅Pi,t\)h\_\{i,t\}^\{\\text\{target\}\}\\leftarrow\-V\_\{\\text\{short\}\}/\(N\_\{\\text\{short\}\}\\cdot P\_\{i,t\}\)18\.end for19\.for eachi∈ℐlong∪ℐshorti\\in\\mathcal\{I\}\_\{\\text\{long\}\}\\cup\\mathcal\{I\}\_\{\\text\{short\}\}do20\.submit\_order​\(i,hi,ttarget−hi,t−1\)\\text\{submit\\\_order\}\(i,h\_\{i,t\}^\{\\text\{target\}\}\-h\_\{i,t\-1\}\)21\.end for

## Appendix BExperimental Details

### B\.1Dataset Details

#### B\.1\.1Data Sources

The daily OHLCV data for CSI 300 constituents is collected fromBaostockBaoStock \[[2026](https://arxiv.org/html/2605.05580#bib.bib54)\]\. For the S&P 500 constituents, daily price\-volume data is obtained fromYahoo FinanceAroussi \[[2026](https://arxiv.org/html/2605.05580#bib.bib52)\]\. Fundamental indicators \(PE, PS, PB, DYR\), financial statement data \(quarterly balance sheets, income statements, cash flow statements\), and alternative data \(financial news and corporate announcements\) are sourced fromLixingerLixinger \[[2026](https://arxiv.org/html/2605.05580#bib.bib55)\]\.

#### B\.1\.2Data Storage Format

Data are stored in the following formats:

- •Daily OHLCV data:Stored in CSV format, with each row corresponding to a trading day and columns representing Open, High, Low, Close, and Volume\.
- •Fundamental indicators:Stored in CSV format, with each row corresponding to a trading day per asset and columns representing PE, PS, PB, and DYR\.
- •Financial statements:Stored in JSON format, organized hierarchically by asset ticker and reporting quarter\. Each JSON object contains standardized fields for balance sheet, income statement, and cash flow statement items\.
- •Alternative data \(news and announcements\):Stored in JSON format, where each entry contains the publication timestamp, asset ticker, headline, full content, and sentiment metadata \(when available\)\.

### B\.2Metrics Details

We provide the mathematical formulations for the evaluation metrics\.

Annualized Return \(AR\)\.Given a sequence of daily portfolio valuesV0,V1,…,VTV\_\{0\},V\_\{1\},\\dots,V\_\{T\}over an evaluation period ofTTtrading days, the total return isRtotal=VT−V0V0R\_\{\\text\{total\}\}=\\frac\{V\_\{T\}\-V\_\{0\}\}\{V\_\{0\}\}\. The Annualized Return is computed as:

AR=\(1\+Rtotal\)DT−1\\text\{AR\}=\\left\(1\+R\_\{\\text\{total\}\}\\right\)^\{\\frac\{D\}\{T\}\}\-1\(6\)whereDDdenotes the number of trading days per calendar year, withD=243D=243for the CSI 300 \(Chinese A\-share market\) andD=252D=252for the S&P 500 \(U\.S\. equity market\)\.

Sharpe Ratio \(SR\)\.Letrt=Vt−Vt−1Vt−1r\_\{t\}=\\frac\{V\_\{t\}\-V\_\{t\-1\}\}\{V\_\{t\-1\}\}denote the daily portfolio return on daytt\. The Sharpe Ratio is defined as:

SR=D⋅\(1T​∑t=1Trt−rf\)1T−1​∑t=1T\(rt−r¯\)2\\text\{SR\}=\\frac\{\\sqrt\{D\}\\cdot\\left\(\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}r\_\{t\}\-r\_\{f\}\\right\)\}\{\\sqrt\{\\frac\{1\}\{T\-1\}\\sum\_\{t=1\}^\{T\}\(r\_\{t\}\-\\bar\{r\}\)^\{2\}\}\}\(7\)whereDDis the number of trading days per year,rfr\_\{f\}is the daily risk\-free rate\. Following conventional practice, we set the annualized risk\-free rate to 1\.25% for the Chinese market \(CSI 300\) and 3\.81% for the U\.S\. market \(S&P 500\), based on the 2\-year government bond yields\. Source:Trading EconomicsTrading Economics \[[2026](https://arxiv.org/html/2605.05580#bib.bib53)\]\.

Maximum Drawdown \(MDD\)\.LetVtV\_\{t\}denote the portfolio value at timett\. The Maximum Drawdown is the largest peak\-to\-trough decline over the evaluation period:

MDD=mint∈\[0,T\]⁡\(Vt−maxs∈\[0,t\]⁡Vsmaxs∈\[0,t\]⁡Vs\)\\text\{MDD\}=\\min\_\{t\\in\[0,T\]\}\\left\(\\frac\{V\_\{t\}\-\\max\_\{s\\in\[0,t\]\}V\_\{s\}\}\{\\max\_\{s\\in\[0,t\]\}V\_\{s\}\}\\right\)\(8\)
Information Coefficient \(IC\)\.For a given factor and a set ofNNassets at timett, let𝐟t∈ℝN\\mathbf\{f\}\_\{t\}\\in\\mathbb\{R\}^\{N\}denote the vector of factor values and𝐫t\+1∈ℝN\\mathbf\{r\}\_\{t\+1\}\\in\\mathbb\{R\}^\{N\}denote the vector of next\-day forward returns\. The IC at timettis the Pearson correlation coefficient:

ICt=Cov​\(𝐟t,𝐫t\+1\)σ𝐟t​σ𝐫t\+1\\text\{IC\}\_\{t\}=\\frac\{\\text\{Cov\}\(\\mathbf\{f\}\_\{t\},\\mathbf\{r\}\_\{t\+1\}\)\}\{\\sigma\_\{\\mathbf\{f\}\_\{t\}\}\\sigma\_\{\\mathbf\{r\}\_\{t\+1\}\}\}\(9\)
Information Coefficient Information Ratio \(ICIR\)\.Given a time series of IC valuesIC1,…,ICT\\text\{IC\}\_\{1\},\\dots,\\text\{IC\}\_\{T\}, the ICIR is defined as:

ICIR=IC¯σIC\\text\{ICIR\}=\\frac\{\\overline\{\\text\{IC\}\}\}\{\\sigma\_\{\\text\{IC\}\}\}\(10\)whereIC¯=1T​∑t=1TICt\\overline\{\\text\{IC\}\}=\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}\\text\{IC\}\_\{t\}is the mean IC, andσIC=1T−1​∑t=1T\(ICt−IC¯\)2\\sigma\_\{\\text\{IC\}\}=\\sqrt\{\\frac\{1\}\{T\-1\}\\sum\_\{t=1\}^\{T\}\(\\text\{IC\}\_\{t\}\-\\overline\{\\text\{IC\}\}\)^\{2\}\}is the standard deviation of IC\.

### B\.3Simulation Settings

To ensure realistic backtesting and live trading evaluation, we implement market\-specific exchange simulators that adhere to the distinct trading regulations of the Chinese A\-share market \(CSI 300\) and the U\.S\. equity market \(S&P 500\)\. Both simulators handle order matching, account management, and data persistence, with key differences summarized below\.

CSI 300 Exchange \(A\-share Market\)\.The A\-share market simulator enforces the T\+1 settlement rule, which prohibits selling shares on the same day they are purchased\. Additionally, short selling is generally not permitted for retail\-oriented backtesting environments\. Consequently, the exchange restricts all position changes to long\-only operations\. The commission rate is set to0\.02%0\.02\\%\(2 basis points\) per transaction, reflecting typical brokerage fees in the Chinese market\.

S&P 500 Exchange \(U\.S\. Market\)\.The U\.S\. market simulator operates under T\+0 settlement for intraday trading \(with daily rebalancing treated as instantaneous\) and fully supports short selling\. For short positions, an initial margin requirement of20%20\\%of the position value is enforced, and a maintenance margin of80%80\\%ensures that equity remains sufficient to cover potential adverse price movements\. The commission rate is set to0\.01%0\.01\\%\(1 basis point\) per transaction, aligning with prevailing low\-cost brokerage structures in U\.S\. markets\.

Integration with Reference Strategy\.The market\-specific constraints are naturally integrated with the reference strategyπref\\pi\_\{\\text\{ref\}\}\(Algorithm 4\)\. The hyperparameterγ∈\[−1,1\]\\gamma\\in\[\-1,1\]controls the net exposure bias\. For the CSI 300, where short selling is prohibited, we setγ=1\\gamma=1, effectively allocating all risk capital to long positions\. For the S&P 500, which accommodates long\-short portfolios, we setγ=0\.5\\gamma=0\.5, maintaining a net long bias while allowing a portion of capital to be deployed in short positions\. Across both markets, the gross exposure parameter is uniformly set toβ=0\.8\\beta=0\.8, ensuring that the strategy utilizes80%80\\%of the net asset value for position sizing, leaving a20%20\\%cash buffer for risk management and margin requirements\.

### B\.4Quantitative Justification of Negligible Trading Frictions

We provide a quantitative, order\-of\-magnitude analysis to assess whether turnover and slippage introduce material frictional costs under our daily\-frequency rebalancing framework\. The argument proceeds by bounding both the daily turnover ratio and the aggregate slippage cost relative to the portfolio’s net asset value \(NAV\), drawing on the hyperparameter settings and empirical transaction statistics observed across experimental tracks\.

#### B\.4\.1Turnover Analysis

Under the reference strategyπref\\pi\_\{\\text\{ref\}\}\(Algorithm 4\), the gross exposure parameter is set toβ=0\.8\\beta=0\.8, meaning that80%80\\%of NAV is deployed for position\-taking at each rebalancing, with the remaining20%20\\%held as a cash buffer\. The net exposure biasγ\\gammais configured asγ=1\\gamma=1for the CSI 300 market \(long\-only\) andγ=0\.5\\gamma=0\.5for the S&P 500 market \(long\-short\)\. These hyperparameters impose a structural constraint on the gross position rate, which remains stable around the prescribed exposure level throughout the rebalancing process\. Consequently, the maximum capital reshuffled between adjacent rebalancing dates is bounded by the total value of positions that can be entered or exited, which cannot exceed a small multiple of the targeted gross exposure\.

Empirically, across all backtesting tracks, the daily turnover rateτt\\tau\_\{t\}—defined as the total absolute executed order value normalized by the prior day’s NAV—remains well\-controlled\. A conservative upper bound is given by

τt≤1\.0,\\tau\_\{t\}\\leq 1\.0,\(11\)representing the extreme scenario in which the portfolio undergoes a complete liquidation and reinvestment of the entire deployed capital within a single day\. In practice, daily turnover is substantially lower than this bound\. A material fraction of adjustments partially offset one another—for example, reducing an existing position while increasing another within the same leg—so the net capital reshuffled is considerably less than the gross sum of individual position changes\. Moreover, the strategy rebalances at a daily frequency but selects positions based on slowly varying factor signals, a design that naturally suppresses excessive turnover\. Whether this level of turnover introduces statistically significant frictional costs is assessed jointly with slippage in the following subsection\.

#### B\.4\.2Slippage Analysis

Slippage captures the deviation between the decision\-time signal price \(the end\-of\-day close\) and the execution price achieved near the market close\. For both CSI 300 and S&P 500 constituents, stock prices typically range from 10 to 100 in local currency units \(RMB for CSI 300, USD for S&P 500\), and the minimum tick size is uniformlyδ=0\.1\\delta=0\.1\. Given that the traded universe consists of large\-cap, highly liquid equities and position adjustments are granularly distributed, the price impact of individual trades is expected to be limited under normal market depth conditions\. The per\-trade execution price deviation can be bounded by one tick, i\.e\.,\|Δ​Pi\|≤0\.1\|\\Delta P\_\{i\}\|\\leq 0\.1, which translates into a relative slippage bounded by

\|Δ​Pi\|Pi≤0\.110=1\.0%\(worst\-case\),\\frac\{\|\\Delta P\_\{i\}\|\}\{P\_\{i\}\}\\leq\\frac\{0\.1\}\{10\}=1\.0\\%\\quad\\text\{\(worst\-case\)\},\(12\)with a typical value on the order of0\.1/50=0\.2%0\.1/50=0\.2\\%, where the reference price of 50 serves as a representative midpoint of the observed price range and may vary with the actual execution price level\.

To assess the aggregate impact, letVtotal=τt⋅NAVV\_\{\\text\{total\}\}=\\tau\_\{t\}\\cdot\\text\{NAV\}denote the total daily traded value and letNNdenote the number of intraday position adjustments\. Under the assumption that individual execution errors are zero\-mean and independent across trades, the aggregate daily slippage costStS\_\{t\}satisfies

𝔼​\[St\]=0,Var​\(St\)=∑i=1N\(\|Δ​Pi\|Pi⋅vi\)2≤N⋅\(0\.2%⋅VtotalN\)2=\(0\.2%\)2⋅Vtotal2N,\\mathbb\{E\}\[S\_\{t\}\]=0,\\quad\\text\{Var\}\(S\_\{t\}\)=\\sum\_\{i=1\}^\{N\}\\left\(\\frac\{\|\\Delta P\_\{i\}\|\}\{P\_\{i\}\}\\cdot v\_\{i\}\\right\)^\{2\}\\leq N\\cdot\\left\(0\.2\\%\\cdot\\frac\{V\_\{\\text\{total\}\}\}\{N\}\\right\)^\{2\}=\\frac\{\(0\.2\\%\)^\{2\}\\cdot V\_\{\\text\{total\}\}^\{2\}\}\{N\},\(13\)whereviv\_\{i\}denotes the traded value of adjustmentii\. Expressing the standard deviation as a fraction of NAV yields

Var​\(St\)NAV≤0\.2%⋅τtN\.\\frac\{\\sqrt\{\\text\{Var\}\(S\_\{t\}\)\}\}\{\\text\{NAV\}\}\\leq\\frac\{0\.2\\%\\cdot\\tau\_\{t\}\}\{\\sqrt\{N\}\}\.\(14\)Given the stable gross position rate and the granular nature of position adjustments, the effective number of independent tradesNNremains sufficiently large to suppress the aggregate slippage volatility\. Even under the conservative parameterizationτt=1\.0\\tau\_\{t\}=1\.0and a modestNN, the resulting bound is an order of magnitude smaller than the typical daily return volatility of a diversified equity portfolio and is therefore unlikely to materially affect performance metrics such as the Sharpe ratio or maximum drawdown\.

### B\.5Compute Resources

All experiments reported in this paper are training\-free\. The proposed AlphaCrafter framework does not involve any model fine\-tuning, gradient\-based optimization, or local model hosting\. All LLM interactions—including factor generation by the Miner, regime assessment by the Screener, and hyperparameter optimization by the Trader—are conducted exclusively through official API calls to the respective model providers\. Consequently, the computational requirements for reproducing our results are minimal at the client side: experiments can be run on a standard CPU machine with a stable internet connection, as the heavy computation is offloaded to the cloud\-based inference endpoints of the API providers\. The primary resource cost is the API usage fee, which varies by provider and query volume\. We did not track precise per\-experiment execution time, as it depends primarily on API latency and rate limits rather than local compute capacity\.

## Appendix CCase Study

### C\.1Factor Semantic Diversity and Novelty Analysis

To rigorously assess whether the factors discovered by AlphaCrafter constitute genuinely alpha signals rather than syntactic recombinations of classical factors, we conduct a semantic diversity analysis grounded in operator\-tree topology\.

Every factor expression is parsed into a normalized abstract syntax tree \(AST\), where leaf nodes represent raw data \(e\.g\.,close,volume\) and internal nodes denote operators \(e\.g\.,rank,ts\_corr\)\. Letδ​\(fi,fj\)∈\[0,1\]\\delta\(f\_\{i\},f\_\{j\}\)\\in\[0,1\]denote the normalized tree edit distance \(NTED\) between two factors, computed by dividing the raw edit distance by the sum of the two tree sizes\. This metric captures structural divergence at the algorithmic level, independent of parameter values such as rolling\-window lengths\.

We define two complementary indices\. For a factor library𝒜\(k\)\\mathcal\{A\}^\{\(k\)\}produced in trialkk, letnk=\|𝒜\(k\)\|n\_\{k\}=\|\\mathcal\{A\}^\{\(k\)\}\|be the number of factors in that trial, and letai\(k\)∈𝒜\(k\)a\_\{i\}^\{\(k\)\}\\in\\mathcal\{A\}^\{\(k\)\}denote theii\-th factor expression\. Theinternal semantic diversityis

Φintra\(k\)=2nk​\(nk−1\)​∑1≤i<j≤nkδ​\(ai\(k\),aj\(k\)\),\\Phi\_\{\\text\{intra\}\}^\{\(k\)\}=\\frac\{2\}\{n\_\{k\}\(n\_\{k\}\-1\)\}\\sum\_\{1\\leq i<j\\leq n\_\{k\}\}\\delta\\big\(a\_\{i\}^\{\(k\)\},\\;a\_\{j\}^\{\(k\)\}\\big\),\(15\)which computes the arithmetic mean of the NTED over all unordered pairs of distinct factors within the same trial\. The normalization factor2/\[nk​\(nk−1\)\]2/\[n\_\{k\}\(n\_\{k\}\-1\)\]equals the reciprocal of the number of such pairs,\(nk2\)\\binom\{n\_\{k\}\}\{2\}\. A highΦintra\\Phi\_\{\\text\{intra\}\}indicates that the Miner explores a broad logical spectrum rather than resampling a narrow set of templates\.

Letℬ\\mathcal\{B\}denote the Alpha158Microsoft Qlib Contributors \[[2026](https://arxiv.org/html/2605.05580#bib.bib49)\]classical reference library, withbj∈ℬb\_\{j\}\\in\\mathcal\{B\}indexing its constituent factors\. Theexternal semantic noveltyagainstℬ\\mathcal\{B\}is

Φinter\(k\)=1nk​∑i=1nkminbj∈ℬ⁡δ​\(ai\(k\),bj\),\\Phi\_\{\\text\{inter\}\}^\{\(k\)\}=\\frac\{1\}\{n\_\{k\}\}\\sum\_\{i=1\}^\{n\_\{k\}\}\\min\_\{b\_\{j\}\\in\\mathcal\{B\}\}\\;\\delta\\big\(a\_\{i\}^\{\(k\)\},\\;b\_\{j\}\\big\),\(16\)where the inner minimization identifies, for each generated factorai\(k\)a\_\{i\}^\{\(k\)\}, its nearest structural neighbour in the classical library\. The outer sum averages these nearest\-neighbour distances across allnkn\_\{k\}factors in the trial\. A highΦinter\\Phi\_\{\\text\{inter\}\}implies systematic departure from established factor structures\.

ForKKindependent trials, we report the trial\-averaged metrics

Φ¯intra=1K​∑k=1KΦintra\(k\),Φ¯inter=1K​∑k=1KΦinter\(k\)\.\\bar\{\\Phi\}\_\{\\text\{intra\}\}=\\frac\{1\}\{K\}\\sum\_\{k=1\}^\{K\}\\Phi\_\{\\text\{intra\}\}^\{\(k\)\},\\qquad\\bar\{\\Phi\}\_\{\\text\{inter\}\}=\\frac\{1\}\{K\}\\sum\_\{k=1\}^\{K\}\\Phi\_\{\\text\{inter\}\}^\{\(k\)\}\.\(17\)
![Refer to caption](https://arxiv.org/html/2605.05580v1/x7.png)\(a\)CSI 300 Market
![Refer to caption](https://arxiv.org/html/2605.05580v1/x8.png)\(b\)S&P 500 Market

Figure 5:Semantic diversity and novelty metrics across three LLM backbones\.Figure[5](https://arxiv.org/html/2605.05580#A3.F5)reportsΦ¯intra\\bar\{\\Phi\}\_\{\\text\{intra\}\}andΦ¯inter\\bar\{\\Phi\}\_\{\\text\{inter\}\}across both markets\. Two principal findings emerge\. GPT 5\.3 Codex excels at structural novelty\. It consistently achieves the highestΦ¯inter\\bar\{\\Phi\}\_\{\\text\{inter\}\}, indicating that it generates factors whose operator\-tree configurations lie furthest from classical Alpha158 formulations\. ItsΦ¯intra\\bar\{\\Phi\}\_\{\\text\{intra\}\}also leads, suggesting that its exploratory mechanism spans diverse logical structures rather than refining a narrow motif\. Claude Opus 4\.6 and Gemini 3\.1 Pro exhibit moderate but stable novelty scores\.

Critically, these semantic properties exhibit a nuanced relationship with downstream trading performance\. Despite GPT 5\.3 Codex’s superiority in generating structurally novel factors, its realized risk\-adjusted returns \(reported in Section[3\.3\.2](https://arxiv.org/html/2605.05580#S3.SS3.SSS2)\) do not surpass those of Claude Opus 4\.6\. This divergence carries an important implication:greater factor novelty does not monotonically translate into superior investment outcomes\. The Screener’s role in filtering and ensemble construction under market\-regime awareness appears to act as a moderating mechanism, selecting factors that are not merely original but also seasonally appropriate\.

### C\.2Regime Coherence Analysis

To evaluate the fidelity of AlphaCrafter’s market perception, we analyze the alignment between the Screener agent’s semantic regime assessments and empirically measured market conditions\. We conduct this case study using a representative trial of AlphaCrafter powered by Claude Opus 4\.6, selected as the median\-performing run to avoid cherry\-picking\. For each trading cycle, the Screener produces qualitative judgments across three distinct market dimensions: trend direction, volatility regime, and correlation structure\.

#### C\.2\.1Semantic\-to\-Numerical Mapping

We map each semantic label to a discrete numerical value in the set\{0,0\.25,0\.5,0\.75,1\}\\\{0,0\.25,0\.5,0\.75,1\\\}based on its ordinal intensity\. For instance, trend labels are mapped as: “strong downtrend”→0\\to 0, “downtrend”→0\.25\\to 0\.25, “range\-bound”→0\.5\\to 0\.5, “uptrend”→0\.75\\to 0\.75, “strong uptrend”→1\\to 1\. Volatility and correlation labels follow analogous mappings from “low” to “high” and from “low dispersion” to “index\-led”, respectively\.

#### C\.2\.2Market Proxy Computation

For each cyclecc, we compute quantitative market proxies using trailing windows ending at timestampτc\\tau\_\{c\}, withL=20L=20days for volatility and correlation, andL=60L=60days for trend\. Historical data preceding the evaluation period is used to pre\-warm all metrics\.

Trend proxyMc\(trend\)M\_\{c\}^\{\\text\{\(trend\)\}\}: the normalized cumulative return of the market index over the past 60 trading days, mapped to\[0,1\]\[0,1\]via a logistic transformation:

Mc\(trend\)=11\+e−rc\(60\)/σ0,M\_\{c\}^\{\\text\{\(trend\)\}\}=\\frac\{1\}\{1\+e^\{\-r\_\{c\}^\{\(60\)\}/\\sigma\_\{0\}\}\},\(18\)whererc\(60\)r\_\{c\}^\{\(60\)\}is the 60\-day log return, andσ0=σann×60/D\\sigma\_\{0\}=\\sigma\_\{\\text\{ann\}\}\\times\\sqrt\{60/D\}withσann=0\.2\\sigma\_\{\\text\{ann\}\}=0\.2as the long\-term annualized volatility estimate\.

Volatility proxyMc\(vol\)M\_\{c\}^\{\\text\{\(vol\)\}\}: the realized volatility of the index over the past 20 trading days, min\-max normalized to\[0,1\]\[0,1\]using the empirical 5th and 95th percentiles from the full sample:

Mc\(vol\)=min⁡\(max⁡\(σc\(20\)−q0\.05q0\.95−q0\.05,0\),1\),M\_\{c\}^\{\\text\{\(vol\)\}\}=\\min\\left\(\\max\\left\(\\frac\{\\sigma\_\{c\}^\{\(20\)\}\-q\_\{0\.05\}\}\{q\_\{0\.95\}\-q\_\{0\.05\}\},0\\right\),1\\right\),\(19\)whereσc\(20\)=D19​∑i=120\(rt−i\+1−r¯\)2\\sigma\_\{c\}^\{\(20\)\}=\\sqrt\{\\frac\{D\}\{19\}\\sum\_\{i=1\}^\{20\}\(r\_\{t\-i\+1\}\-\\bar\{r\}\)^\{2\}\}is the annualized realized volatility\.

Correlation proxyMc\(corr\)M\_\{c\}^\{\\text\{\(corr\)\}\}: the average absolute pairwise correlation of daily returns among index constituents:

Mc\(corr\)=2K​\(K−1\)​∑i=1K−1∑j=i\+1K\|ρi​j,c\(20\)\|,M\_\{c\}^\{\\text\{\(corr\)\}\}=\\frac\{2\}\{K\(K\-1\)\}\\sum\_\{i=1\}^\{K\-1\}\\sum\_\{j=i\+1\}^\{K\}\|\\rho\_\{ij,c\}^\{\(20\)\}\|,\(20\)whereρi​j,c\(20\)\\rho\_\{ij,c\}^\{\(20\)\}is the 20\-day Pearson correlation between stocksiiandjj, andKKis the number of constituents with sufficient trading history\.

#### C\.2\.3Results

We evaluate the coherence between Screener’s semantic assessments and market\-derived proxies over 50 consecutive trading cycles\. For each dimension, we construct a similarity matrix where cell\(i,j\)\(i,j\)represents the similarity between the semantic assessment at cycleiiand the market proxy at cyclejj, defined as1−\|semantici−marketj\|1\-\|\\text\{semantic\}\_\{i\}\-\\text\{market\}\_\{j\}\|\. The raw similarity values are then linearly normalized to\[0,1\]\[0,1\]across the entire matrix, with11indicating perfect alignment \(identical values\) and0indicating maximal dissimilarity\.

Figure[6](https://arxiv.org/html/2605.05580#A3.F6)and Figure[7](https://arxiv.org/html/2605.05580#A3.F7)present the resulting heatmaps for the CSI 300 and S&P 500 markets, respectively\. Below are several observations\.

First, along the diagonal where semantic assessments align with contemporaneous market proxies, we observe consistently high similarity values \(deep blue cells\)\. This indicates that Screener’s regime diagnosis accurately reflects prevailing market conditions for trend and volatility dimensions across most cycles\. The off\-diagonal regions exhibit lower similarity, confirming that the agent’s assessments are primarily responsive to current rather than future or past market states\.

Second, the correlation dimension exhibits markedly different behavior\. Unlike trend and volatility, which show clear diagonal structures, the correlation heatmaps appear uniformly deep blue across nearly all cycle pairs\. This pattern arises because cross\-sectional correlations among index constituents remained remarkably stable throughout the evaluation window, with limited temporal variation\. Consequently, the market correlation proxy takes nearly constant values across cycles, and Screener’s semantic assessments align uniformly with this stable proxy, yielding uniformly high similarity independent of cycle alignment\.

![Refer to caption](https://arxiv.org/html/2605.05580v1/figures/regime_heatmaps_a.png)Figure 6:Regime coherence heatmaps for CSI 300 market\.![Refer to caption](https://arxiv.org/html/2605.05580v1/figures/regime_heatmaps_us.png)Figure 7:Regime coherence heatmaps for S&P 500 market\.In summary, Screener demonstrates reliable regime perception across all three dimensions, with semantic assessments closely tracking contemporaneous market conditions\. The strong diagonal alignment observed in the heatmaps confirms that the agent’s qualitative judgments consistently reflect actual market states\.

### C\.3Risk Management and Position Exposure Analysis

To examine whether AlphaCrafter dynamically adjusts its market exposure in response to prevailing risk conditions, we conduct a case study using a representative trial of Claude Opus 4\.6 — selected as the median\-performing run to avoid cherry\-picking\. We focus on the U\.S\. equity market, where both long and short positions are permissible, in contrast to the long\-only constraint of the Chinese A\-share market\. We construct two rolling\-window metrics at a 10\-day horizon\. The first,market volatility, is defined as the range amplitude over the window:

Vt=maxτ∈\[t−9,t\]⁡Hτ−minτ∈\[t−9,t\]⁡LτOt−9,V\_\{t\}=\\frac\{\\max\_\{\\tau\\in\[t\-9,t\]\}H\_\{\\tau\}\-\\min\_\{\\tau\\in\[t\-9,t\]\}L\_\{\\tau\}\}\{O\_\{t\-9\}\},\(21\)whereHτH\_\{\\tau\},LτL\_\{\\tau\}, andOτO\_\{\\tau\}denote the daily high, low, and open prices of the market index, respectively\. The second,average net position exposure, is the 10\-day rolling mean of the account’s net position rate:

Et=110​∑τ=t−9trτ,E\_\{t\}=\\frac\{1\}\{10\}\\sum\_\{\\tau=t\-9\}^\{t\}r\_\{\\tau\},\(22\)whererτr\_\{\\tau\}is the net position rate \(long market value minus short market value, divided by total assets\) on dayτ\\tau\. Both series are sampled at non\-overlapping 10\-day intervals to avoid serial dependence, and their relationship is assessed via linear regression\.

![Refer to caption](https://arxiv.org/html/2605.05580v1/x9.png)\(a\)Time series of market volatility and average net position exposure\.
![Refer to caption](https://arxiv.org/html/2605.05580v1/x10.png)\(b\)Scatter plot with ordinary least\-squares regression line\.

Figure 8:Relationship between market volatility and net position exposure for a representative Claude Opus 4\.6 trial on the S&P 500 market\.Figure[8](https://arxiv.org/html/2605.05580#A3.F8)presents the results\. Panel \(a\) displays the co\-evolution ofVtV\_\{t\}andEtE\_\{t\}over the backtest horizon\. Visual inspection reveals a clear inverse relationship: during episodes of elevated market volatility, the net position rate tends to decline, whereas calmer regimes coincide with higher exposure levels\. Panel \(b\) corroborates this observation quantitatively\. The scatter plot, together with the ordinary least\-squares regression line, yields a negative slope, and the Pearson correlation coefficientrrconfirms a statistically significant inverse association\.

This pattern reflects a coherent risk\-management behaviour embedded in the AlphaCrafter pipeline\. When market conditions become turbulent, the Screener’s regime\-aware filtering mechanism deprioritizes high\-uncertainty factors, leading the Trader to reduce net exposure — either by decreasing gross positions or by balancing long and short legs\. Conversely, during low\-volatility regimes, the system deploys capital more aggressively\. Importantly, this adaptive de\-risking emerges without explicit volatility\-targeting rules; rather, it is a consequence of the factor selection and portfolio construction layers interacting with the market environment\.

These findings provide evidence that AlphaCrafter possesses a degree of intrinsic risk awareness: its position\-taking behaviour is not static but responds in a disciplined, counter\-cyclical manner to shifting market stress\. This property is particularly valuable in long–short equity markets, where unmanaged gross exposure can amplify drawdowns during volatility spikes\. The observed negative correlation between net exposure and market turmoil thus supports the claim that the full Miner–Screener–Trader loop yields not only predictive alpha signals but also prudent downside control\.

## Appendix DLimitations

While AlphaCrafter demonstrates robust and state\-of\-the\-art performance across multiple markets, it is crucial to contextualize its achievements within the inherent limitations of its design and the experimental setup\. Furthermore, these limitations illuminate several promising avenues for future research\. This section provides a critical analysis of the framework’s current constraints and explores the potential for its evolution into more comprehensive and realistic trading systems\.

Daily Trading Frequency and Transaction Cost Abstraction\.Our experimental framework is confined to a daily trading frequency, and we adopt the simplifying assumption that transaction costs, market impact, and slippage are negligible at this horizon\. A quantitative analysis of turnover and slippage, detailed in Appendix[B\.4](https://arxiv.org/html/2605.05580#A2.SS4), indicates that the aggregate frictional drag is statistically insignificant under our backtesting parameters\. Nevertheless, this assessment is necessarily conducted within the controlled environment of our simulator and cannot fully replicate the complexities of live markets\. Factors such as the price impact of trading illiquid index constituents, time\-varying bid\-ask spreads, intraday order book dynamics, and episodic liquidity droughts may introduce non\-trivial frictions in real\-world execution\. The abstracted transaction cost model further assumes fixed commission rates and symmetric, zero\-mean slippage, which may not hold during periods of market stress or for larger order sizes\. Consequently, although our evidence supports treating frictions as negligible for the purpose of comparative strategy evaluation, the framework’s performance under a more realistic simulator with variable transaction cost models and intraday execution constraints remains an open question\.

Incomplete Mitigation of LLM Data Leakage and Look\-Ahead Biases\.Although we employ a dedicated live trading period with a cutoff date strictly posterior to the training data of all backbone LLMs, this design choice addresses only one dimension of potential data leakage\. The LLMs may still harbor parametric knowledge of regime\-dependent correlation structures, volatility clustering patterns, or tail\-dependence characteristics that, while learned from historical data, are not formally available to the agent during its forward\-looking reasoning process\. This creates a subtle look\-ahead bias: the model can implicitly condition on distributional properties of the full sample, even when the agent’s observable window is strictly truncated\. Our experimental protocol does not include a systematic method to decompose performance gains attributable to the model’s internet\-scale financial pre\-training from those generated through the agent’s reasoning and validation loopsKonget al\.\[[2026](https://arxiv.org/html/2605.05580#bib.bib57)\]\.

Evaluated Backbone Models and Generalizability\.Our stability study validates AlphaCrafter’s performance across three state\-of\-the\-art LLMs: GPT 5\.3 CodexOpenAI \[[2026](https://arxiv.org/html/2605.05580#bib.bib45)\], Claude Opus 4\.6Anthropic \[[2026](https://arxiv.org/html/2605.05580#bib.bib46)\], and Gemini 3\.1 ProGoogle DeepMind \[[2026](https://arxiv.org/html/2605.05580#bib.bib47)\]\. While results indicate low sensitivity to the underlying model, this conclusion is drawn from a set of architectures that may share convergent training methodologies, datasets, and alignment techniques\. The generalizability of our findings to smaller, open\-source models \(e\.g\., Llama seriesMeta \[[2026](https://arxiv.org/html/2605.05580#bib.bib60)\], DeepSeek seriesDeepSeek \[[2026](https://arxiv.org/html/2605.05580#bib.bib61)\]\) or models with distinct pre\-training distributions remains unverified\. A significant performance cliff when transitioning to more accessible or specialized models would limit the framework’s democratization and reproducibility\.

Scope of the Asset Universe and Extension to Alternative Markets\.The experimental universe encompasses the CSI 300 and S&P 500 indices over a long period, a setting that already provides meaningful coverage of distinct market structures and macro\-financial conditions\. Extending this scope to additional asset classes would further enrich the assessment of the framework’s generality\. In particular, factor definitions, regime dynamics, and execution constraints differ fundamentally in futures, options, and cryptocurrency markets\. Exploring the framework’s feasibility and adaptation requirements in these alternative markets, especially under the high\-leverage and contract\-specific microstructure that characterize derivatives trading, constitutes a natural and valuable direction for future work\.

## Appendix EImpact Statement

This paper presentsAlphaCrafter, a multi\-agent framework for autonomous quantitative trading\. We recognize that automated trading systems carry dual\-use potential and warrant careful consideration of their societal implications\.

Positive Impacts\.By automating the full pipeline from factor discovery to execution, AlphaCrafter may lower the technical barrier to systematic investing, enabling smaller research teams and academic institutions to develop disciplined, data\-driven strategies without the extensive infrastructure typically reserved for large financial firms\. The framework’s emphasis on robust, regime\-aware adaptation could also encourage more resilient portfolio management practices, potentially reducing pro\-cyclical herding behavior that amplifies market volatility\.

Negative Impacts and Mitigations\.Autonomous trading agents, if widely deployed without oversight, could contribute to market instability through correlated strategy crowding or unintended feedback loops during stress events\. The framework’s reliance on commercial LLMs raises concerns about the concentration of financial decision\-making power in a small number of technology providers\. Furthermore, the opacity of LLM reasoning may complicate regulatory auditing and accountability\. We encourage future deployments to incorporate circuit breakers, position limits, and mandatory human oversight for live trading, and we advocate for transparency in reporting the provenance and validation of LLM\-generated alpha factors\.

## Appendix FPrompt Design

For reproducibility and to facilitate future extensions, we document below the complete prompt designs used for each agent in the AlphaCrafter framework\. All prompts are presented verbatim as deployed in our experiments\.

Universal Instruction for A\-Share MarketThis is an autonomous quantitative trading system composed of three specialized agents working in coordination\. The system operates in the Chinese A\-share market for a one\-year trading period\. Historical data from early 2016 to present is available for analysis, factor development, and strategy validation\. The goal is to achieve stable returns while managing risk effectively\.You are NOT a conversational AI\. You do NOT chat with users\. You do NOT provide explanations, ask clarifying questions, or engage in any form of dialogue\.Your sole function is to operate as an automated workflow executor within a multi\-agent quantitative trading system\.\[Universe\]•Trading universe: CSI 300 index constituent stocks•The watchlist contains only CSI 300 stocks\. Stocks are tradable, while indices are for observation only and cannot be traded\.•Initially, the account starts with a cash balance of 10,000,000 CNY and no stock holdings\.\[Rules\]1\.T\+1 Settlement:•Shares bought today are available for sale tomorrow•available\_quantity=shares bought before today\\text\{available\\\_quantity\}=\\text\{shares bought before today\}2\.Order Execution:•Unfilled orders remain PENDING \(auto\-EXPIRED after 7 trading days\)•Orders auto\-removed after 14 trading days3\.Fees:•Commission rate: 0\.02% \(executed\_amount×0\.0002\\text\{executed\\\_amount\}\\times 0\.0002\)4\.Timing:•Trading day starts at 09:30 and ends at 15:00 \(lunch break from 11:30 to 13:00\)•Trading frequency is limited to once per trading day•Trading executes daily at 14:30 \(market close at 15:00\)5\.Constraints:•Quantity must be a multiple of 100 \(board lot\)\[Workspace\]•Working directory:workspace/\. Use relative paths directly — do NOT prefix paths withworkspace/\.•Directory structure:–strategy\.py: Main strategy file for implementing quantitative trading logic–factors/: Factor library directory\. Each factor is stored as a separate JSON file containing comprehensive factor details\. Files follow the naming convention\{factor\_id\}\.json\.–scripts/: Directory for Python scripts for data processing, factor analysis, or other purposes•All function tools are executed underworkspace/•The workspace is UTF\-8 encoded by default•The version of Python is 3\.10•You will get the tool call response at the next conversation after invoking tools•Do not call too many tools in a single response•End the current workflow turn when there are no tool calls

Universal Instruction for US MarketThis is an autonomous quantitative trading system composed of three specialized agents working in coordination\. The system operates in the US stock market for a one\-year trading period\. Historical data from early 2016 to present is available for analysis, factor development, and strategy validation\. The goal is to achieve stable returns while managing risk effectively\.You are NOT a conversational AI\. You do NOT chat with users\. You do NOT provide explanations, ask clarifying questions, or engage in any form of dialogue\.Your sole function is to operate as an automated workflow executor within a multi\-agent quantitative trading system\.\[Universe\]•Trading universe: S&P 500 index constituent stocks•The watchlist contains only S&P 500 stocks\. Stocks are tradable, while indices are for observation only and cannot be traded\.•S&P 500 component changes are reflected over time; historical backtests use the component set as of each period\.•Initially, the account starts with a cash balance of 10,000,000 USD and no stock holdings\.\[Rules\]1\.T\+0 Settlement:•Shares bought today can be sold on the same day•No lock\-up period for trading2\.Order Execution:•Unfilled orders remain PENDING \(auto\-EXPIRED after 7 trading days\)•Orders auto\-removed after 14 trading days3\.Fees:•Commission rate: 0\.01% \(executed\_amount×0\.0001\\text\{executed\\\_amount\}\\times 0\.0001\)4\.Margin:•Short margin requirement: 20% of position value \(initial margin required to open a short position\)•Maintenance margin: 80% of equity \(minimum equity percentage required to maintain positions\)•Margin calls are triggered when equity falls below maintenance margin5\.Timing:•Trading day starts at 09:30 and ends at 16:00 \(Eastern Time\)•Trading frequency is limited to once per trading day•Trading executes daily at 15:30 ET \(market close at 16:00 ET\)6\.Constraints:•Quantity can be any positive integer•Fractional shares are not supported; integer shares only\[Workspace\]•Working directory:workspace/\. Use relative paths directly — do NOT prefix paths withworkspace/\.•Directory structure:–strategy\.py: Main strategy file for implementing quantitative trading logic–factors/: Factor library directory\. Each factor is stored as a separate JSON file containing comprehensive factor details\. Files follow the naming convention\{factor\_id\}\.json\.–scripts/: Directory for Python scripts for data processing, factor analysis, or other purposes•All function tools are executed underworkspace/•The workspace is UTF\-8 encoded by default•The version of Python is 3\.10•You will get the tool call response at the next conversation after invoking tools•Do not call too many tools in a single response•End the current workflow turn when there are no tool calls\. Make summary according to the output format at the last response without invoking any tool calls

Miner Agent InstructionYou are a factor miner agent\.\[Role\]•Your task is to discover and validate new factor ideas that can be used for portfolio construction\.\[Workflow\]1\.Factor Exploration:•Generate research scripts to explore candidate factors•Factors can include momentum, value, quality, volatility, liquidity, or combinations thereof•Utilize techniques: linear combinations, conditional logic, ratio transformations, or other interpretable methods•Encourage exploring novel factors, but avoid overly complex constructions that are difficult to interpret or maintain2\.Factor Validation:•Execute scripts to compute factor values and performance metrics•Evaluate effectiveness using:–Information Coefficient \(IC\): correlation between factor values and forward returns–IC stability: consistency of predictive power over time \(ICIR, IC hit ratio\)–Turnover: frequency of factor signal changes–Factor coverage: percentage of tradable stocks with valid values–Decay analysis: how predictive power degrades over different holding periods•Validation must be performed across multiple market regimes to assess robustness•Track validation date to monitor factor timeliness and performance drift3\.Factor Persistence:•Save validated factor definitions and results infactors/\{factor\_id\}\.json•Include validation timestamp to track factor aging and recency4\.Continuous Re\-validation:•Currently effective factors must be re\-validated periodically \(e\.g\., every 3 months\) as market conditions evolve•Track factor performance drift over time•Update persistence records with new validation results and dates•Flag factors that show significant decay for review\[Output\]•After each research cycle, provide a summary covering:–Explored Factors:What factor ideas were explored, including motivation and construction approach–Validation Results:Key metrics for each explored factor, noting which met or failed criteria, including validation date–Persistence Actions:What factors were persisted with their assigned status–Current Effective Factors:Which factors are currently effective based on the latest validation, with details on their performance and recency–Plans:Planned exploration directions based on findings\[Note\]•When encountering bugs \(e\.g\., version issues, nonexistent methods\), attempt to use alternative equivalent approaches rather than stubbornly persisting with the problematic method\.

Screener Agent InstructionYou are a factor screener agent\.\[Role\]•Based on current market microstructure and regime, select effective cross\-sectional factors, assign weights or priority levels, and output a factor ensemble for downstream portfolio construction\. Additionally, identify gaps in the current factor library and suggest mining directions\.\[Workflow\]1\.Factor Availability Check:•Query persistence store for currently active factors•Filter for cross\-sectional factors that are valid for the current trading universe•Identify factor categories: Value, Momentum, Quality, Growth, Low\-Risk, Sentiment, Liquidity2\.Market Regime & Risk Assessment:•Overall trend: Bull market \(trend up\), Bear market \(trend down\), Sideways/Range\-bound•Trend strength: Use MA slope, ADX, or consecutive direction days•Risk level: Low, Medium, High \(based on realized volatility, max drawdown, tail events\)•Volatility regime: High/Low volatility favors different factors \(e\.g\., Low\-Vol factor in high vol\)•Liquidity condition: Tight liquidity may penalize turnover\-heavy factors•Correlation regime: When stocks move together, dispersion\-based factors lose power•Sector rotation pace: Fast rotation favors short\-term momentum or mean\-reversion•Breadth: Narrow breadth favors cap\-weighted or quality; wide breadth favors equal\-weight factors•Trend/Mean\-Reversion tendency: Trending markets favor momentum factors; mean\-reverting markets favor reversal or contrarian factors•Sentiment regime: Extreme optimism/pessimism may amplify factor performance or cause crowded trades3\.Factor Selection & Weighting:•For each factor category, assess current market suitability•Select top\-K factors based on suitability score and recent IC/Sharpe•Avoid highly correlated factors to maintain diversification•Assign explicit weights or priority tiers \(e\.g\., Primary / Secondary / Tertiary\)•Prefer factors with stable historical performance under current regime4\.Factor\-Level Risk Constraints:•Flag factors with excessive turnover relative to expected holding horizon•Identify factor crowding \(high correlation among selected factors\)•Flag factors with known execution issues \(slippage, illiquidity sensitivity\)5\.Factor Ensemble Specification:•Output a structured factor set with the following for each factor:–Factor ID / name–Assigned weight–Direction \(long/short or long\-only\)–Optional: transformation hint \(e\.g\., rank, z\-score, winsorize\)6\.Feedback Integration:•Incorporate recent factor performance feedback when available•Adjust weights downward for factors with persistent underperformance7\.Mining Suggestions:•Downgrade or drop factors with execution or stability issues•Based on regime assessment and factor gaps, propose specific mining directions: e\.g\., “current low\-vol environment lacks a quality\-volatility interaction factor”, “sector rotation is fast, consider short\-term mean\-reversion with volume confirmation”, “crowding in momentum suggests exploring orthogonal residuals”\[Output\]•After each cycle, provide a concise summary covering:–Market Assessment:Current market assessment, including overall trend \(Bull/Bear/Sideways\), trend strength, and risk level \(Low/Medium/High\)–Available Factors:List active cross\-sectional factors by category–Selected Factors:Which factors selected, with suitability score and brief rationale–Factor Ensemble:List of factors with weights, direction, and optional hints–Risk Notes:Any factor crowding, high turnover warnings, or regime\-specific risks–Mining Suggestions:Recommended factor exploration directions based on regime gaps or performance shortfalls\[Note\]•If there are not enough available validated factors in the factor library, you should skip this cycle with a skipping message \(i\.e\., do not invoke any tool calls, just output the skipping message as your final response\)\.

Trader Agent InstructionYou are a quantitative trading agent\.\[Role\]•Your task is to update the quantitative trading strategy based on factor ensembles provided\.\[Workflow\]1\.Strategy Configuration:•Receive factor ensemble from Screener Agent•Strategy framework is fixed: cross\-sectional factor\-based selection with rebalancing•Typical pattern: Cross\-sectional ranking with periodic rebalancing–Long leg: select topNNstocks by composite factor score–Short leg \(if allowed\): select bottomMMstocks for short positions•Portfolio type determined by BOTH factor ensemble specification AND market trend regime:–Bull market\(strong uptrend\): Long\-only \(disable short leg regardless of factor spec\)–Bear market\(strong downtrend\): Long\-short or market\-neutral with short bias \(disable pure long\-only\)–Sideways/Choppy\(range\-bound\): Long\-short or market\-neutral \(balanced\)•Dynamic adjustments based on market risk:–Position sizing: scale total exposure up/down based on volatility regime and drawdown risk–Position concentration: adjust number of selected stocks based on breadth and dispersion–Weighting scheme: equal\-weight, cap\-weight, or score\-weight based on regime–Rebalancing frequency: maintain default cadence but can skip or delay under extreme conditions•Maintain strategy parameters \(e\.g\.,NN,MM, position scaling factor, weighting scheme\) as tunable hyperparameters2\.Strategy Validation:•Utilize backtesting tools to validate hyperparameter configurations•Evaluate metrics: Sharpe ratio, max drawdown, turnover, transaction cost impact•Compare hyperparameter variants \(e\.g\., differentNN/MMvalues, weighting schemes\) under current regime•Ensure strategy aligns with factor intent and market context3\.Live Trading \(Optional\):•Call step tool to execute daily\-frequency trading based on strategy configuration•For each cycle:–Retrieve current factor exposures for universe stocks–Compute composite score using factor ensemble–Rank stocks by composite score–Apply position sizing and concentration rules to determine target portfolio–Generate orders: buy underweighted positions, sell overweighted positions4\.Performance Review & Feedback:•Analyze results from backtest and live trading•Assess whether risk adjustments achieved intended protection•Provide execution feedback:–Factor performance: which selected factors contributed positively/negatively–Implementation costs: slippage, turnover impact–Regime alignment: whether market context matched Screener’s assessment\[Output\]•After each trading cycle, provide a summary covering:–Strategy Configuration:Current hyperparameter settings \(NN,MM, weighting scheme, position scaling factor, rebalancing cadence\)–Risk Adjustment:What dynamic adjustments were applied based on market risk assessment–Validation Outcomes:Backtest results for current hyperparameter configuration under recent regime–Execution Results:Live trading outcomes for the cycle \(PnL, turnover, slippage\)–Factor Performance:How individual factors in the ensemble performed in real market–Observations:Regime alignment, anomalies, execution issues–Feedback to Screener:Which factors underperformed, any regime mismatch detected–Plans:Hyperparameter adjustments for next cycle \(e\.g\., changeNN/MM, adjust scaling, modify rebalancing\)\[Note\]1\.If no factor ensemble is received from Screener Agent in the current cycle, you should skip this round with a skipping message \(i\.e\., do not invoke any tool calls, just output the skipping message as your final response\)\. Once you receive a factor ensemble, you should write your strategy in thestrategy\.pyfile\. Never write a strategy that is too complex\.2\.You should always use backtesting tool for validation, but do not rely on backtest results\. Overfitting to backtest results will lead to poor live performance\. But for badly performing strategy in backtesting, you should update the strategy immediately\.3\.Call the step tool only once per trading cycle\. Do not call it multiple times within the same cycle\.4\.If no orders are executed during backtesting or live trading, you must systematically relax the strategy’s constraints until trades are generated\. After each relaxation step, re\-run the backtest to verify that trades are now being executed\.5\.When encountering bugs \(e\.g\., version issues, nonexistent methods\), attempt to use alternative equivalent approaches rather than stubbornly persisting with the problematic method\.

Similar Articles

QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

Papers with Code Trending

QuantAgent is a multi-agent LLM framework designed specifically for high-frequency trading, using four specialized agents (Indicator, Pattern, Trend, Risk) to make rapid, risk-aware decisions based on short-horizon signals. In zero-shot evaluations across ten financial instruments including Bitcoin and Nasdaq futures, it outperforms existing neural and rule-based baselines in predictive accuracy and cumulative return.

TradingAgents: Multi-Agents LLM Financial Trading Framework

Papers with Code Trending

This paper introduces TradingAgents, a multi-agent LLM framework that simulates real-world trading firms to improve stock trading performance. It utilizes specialized agents for analysis and risk management, demonstrating superior results in cumulative returns and Sharpe ratio compared to baselines.

TauricResearch/TradingAgents

GitHub Trending (daily)

TradingAgents is an open-source multi-agent LLM framework for financial trading, with support for various LLM providers and recent updates including new models and features.

@Voxyz_ai: https://x.com/Voxyz_ai/status/2062246736257556654

X AI KOLs Timeline

This article details how to structure multi-agent AI teams for investment research, using open-source projects like TradingAgents and the Bloome platform. It emphasizes that the key to effective agent collaboration is the organizational architecture, not the model intelligence.