Tag
UnpredictaBench is a benchmark for evaluating how well large language models can sample from target distributions, including statistical and natural-language random processes. Experiments show that current models struggle to capture true underlying distributions, with no model exceeding 40% on the KS@100 metric.