distributional-randomness

Tag

Cards List
#distributional-randomness

UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs

arXiv cs.CL · yesterday Cached

UnpredictaBench is a benchmark for evaluating how well large language models can sample from target distributions, including statistical and natural-language random processes. Experiments show that current models struggle to capture true underlying distributions, with no model exceeding 40% on the KS@100 metric.

0 favorites 0 likes
← Back to home

Submit Feedback