environmental-analysis

#environmental-analysis

GeoNatureAgent Benchmark: Benchmarking LLM Agents for Environmental Geospatial Analysis Across Frontier and Open-Weight Foundation Models

arXiv cs.AI ↗ · 5d ago Cached

The paper introduces GeoNatureAgent Benchmark, the first benchmark for evaluating LLM agents on environmental geospatial analysis tasks via structured tool calls. It evaluates seven models on 93 tasks across 18 categories and finds Claude Sonnet 4 achieves highest accuracy at 60.8%, while open-weight models like DeepSeek V3.2 offer strong cost-performance tradeoffs.

0 favorites 0 likes

environmental-analysis

GeoNatureAgent Benchmark: Benchmarking LLM Agents for Environmental Geospatial Analysis Across Frontier and Open-Weight Foundation Models

Submit Feedback