Tag
Introduces UA-Legal-Bench, a five-task benchmark for evaluating large language models on Ukrainian legal reasoning, built from the Unified State Register of Court Decisions. Evaluates 11 LLMs, revealing task-dependent few-shot effects and the misleading nature of accuracy on imbalanced legal tasks.