Tag
CoffeeBench is a benchmark for evaluating LLM agents in a long-horizon multi-agent economic simulation where firms interact over 90 days to maximize profits, revealing differences in communication patterns and performance among various models.