coffeebench

Tag

Cards List
#coffeebench

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

Hugging Face Daily Papers · 2026-06-15 Cached

CoffeeBench is a benchmark for evaluating LLM agents in a long-horizon multi-agent economic simulation where firms interact over 90 days to maximize profits, revealing differences in communication patterns and performance among various models.

0 favorites 0 likes
← Back to home

Submit Feedback