earth-system

Tag

Cards List
#earth-system

TerraBench: Can Agents Reason Over Heterogeneous Earth-System Data?

arXiv cs.AI · 5d ago Cached

TerraBench is a new benchmark for evaluating AI agents' ability to reason over heterogeneous Earth-system data, including gridded data, satellite imagery, and simulator outputs. It reveals significant limitations in current frontier models, with top performers achieving only 59.2% tool-use score on average.

0 favorites 0 likes
← Back to home

Submit Feedback