AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets

Papers with Code Trending Papers

Summary

This paper introduces AI-Trader, the first fully automated live benchmark for evaluating LLMs in financial decision-making across US stocks, A-shares, and cryptocurrencies. It highlights that general intelligence does not guarantee trading success and emphasizes the importance of risk control in autonomous agents.

Large Language Models (LLMs) have demonstrated remarkable potential as autonomous agents, approaching human-expert performance through advanced reasoning and tool orchestration. However, decision-making in fully dynamic and live environments remains highly challenging, requiring real-time information integration and adaptive responses. While existing efforts have explored live evaluation mechanisms in structured tasks, a critical gap remains in systematic benchmarking for real-world applications, particularly in finance where stringent requirements exist for live strategic responsiveness. To address this gap, we introduce AI-Trader, the first fully-automated, live, and data-uncontaminated evaluation benchmark for LLM agents in financial decision-making. AI-Trader spans three major financial markets: U.S. stocks, A-shares, and cryptocurrencies, with multiple trading granularities to simulate live financial environments. Our benchmark implements a revolutionary fully autonomous minimal information paradigm where agents receive only essential context and must independently search, verify, and synthesize live market information without human intervention. We evaluate six mainstream LLMs across three markets and multiple trading frequencies. Our analysis reveals striking findings: general intelligence does not automatically translate to effective trading capability, with most agents exhibiting poor returns and weak risk management. We demonstrate that risk control capability determines cross-market robustness, and that AI trading strategies achieve excess returns more readily in highly liquid markets than policy-driven environments. These findings expose critical limitations in current autonomous agents and provide clear directions for future improvements. The code and evaluation data are open-sourced to foster community research: https://github.com/HKUDS/AI-Trader.
Original Article
View Cached Full Text

Cached at: 05/08/26, 08:40 AM

Paper page - AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets

Source: https://huggingface.co/papers/2512.10971

Abstract

AI-Trader presents the first fully automated live benchmark for evaluating large language models in financial decision-making across multiple markets with autonomous information processing.

Large Language Models(LLMs) have demonstrated remarkable potential asautonomous agents, approaching human-expert performance through advanced reasoning and tool orchestration. However, decision-making in fully dynamic and live environments remains highly challenging, requiring real-timeinformation integrationand adaptive responses. While existing efforts have exploredlive evaluationmechanisms in structured tasks, a critical gap remains in systematicbenchmarkingfor real-world applications, particularly in finance where stringent requirements exist for live strategic responsiveness. To address this gap, we introduce AI-Trader, the first fully-automated, live, and data-uncontaminated evaluation benchmark for LLM agents infinancial decision-making. AI-Trader spans three major financial markets: U.S. stocks, A-shares, and cryptocurrencies, with multiple trading granularities to simulate live financial environments. Our benchmark implements a revolutionary fully autonomous minimal information paradigm where agents receive only essential context and must independently search, verify, and synthesize live market information without human intervention. We evaluate six mainstream LLMs across three markets and multiple trading frequencies. Our analysis reveals striking findings: general intelligence does not automatically translate to effective trading capability, with most agents exhibiting poor returns and weakrisk management. We demonstrate that risk control capability determines cross-market robustness, and that AItrading strategiesachieve excess returns more readily in highly liquid markets than policy-driven environments. These findings expose critical limitations in currentautonomous agentsand provide clear directions for future improvements. The code and evaluation data are open-sourced to foster community research: https://github.com/HKUDS/AI-Trader.

View arXiv pageView PDFProject pageGitHub14.3kautoAdd to collection

Get this paper in your agent:

hf papers read 2512\.10971

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2512.10971 in a model README.md to link it from this page.

Datasets citing this paper1

#### T1anyu/AI-Trader UpdatedDec 19, 2025 • 228

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2512.10971 in a Space README.md to link it from this page.

Collections including this paper4

Similar Articles

TradingAgents: Multi-Agents LLM Financial Trading Framework

Papers with Code Trending

This paper introduces TradingAgents, a multi-agent LLM framework that simulates real-world trading firms to improve stock trading performance. It utilizes specialized agents for analysis and risk management, demonstrating superior results in cumulative returns and Sharpe ratio compared to baselines.

QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

Papers with Code Trending

QuantAgent is a multi-agent LLM framework designed specifically for high-frequency trading, using four specialized agents (Indicator, Pattern, Trend, Risk) to make rapid, risk-aware decisions based on short-horizon signals. In zero-shot evaluations across ten financial instruments including Bitcoin and Nasdaq futures, it outperforms existing neural and rule-based baselines in predictive accuracy and cumulative return.

Agentic Trading: When LLM Agents Meet Financial Markets

arXiv cs.AI

This paper presents a systematic survey and evidence map of 77 studies on LLM-based trading agents, finding that architectural experimentation is expanding rapidly but evaluation protocols, execution semantics, and reproducibility remain critical bottlenecks.

HKUDS/AI-Trader

GitHub Trending (daily)

AI-Trader is an open-source agent-native trading platform from HKUDS that allows AI agents to autonomously register, publish signals, and execute trades across stocks, crypto, forex and other markets.