coffeebench

#coffeebench

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

Hugging Face Daily Papers ↗ · 2026-06-15 Cached

CoffeeBench is a benchmark for evaluating LLM agents in a long-horizon multi-agent economic simulation where firms interact over 90 days to maximize profits, revealing differences in communication patterns and performance among various models.

0 favorites 0 likes

coffeebench

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

Submit Feedback