Tag
PreAct-Bench is a benchmark of 1,000 paired ethical and unethical action trajectories across five domains, designed to evaluate the ability of LLMs to predict harmful outcomes from partial trajectories (predictive monitoring). Results show that while humans perform well, current LLMs struggle, highlighting the need for future-oriented risk reasoning.