Tag
Introduces Ko-WideSearch, a Korean breadth-search benchmark for web agents that evaluates exhaustive set enumeration across 228 tables. Findings show agents have high item recall but struggle with row completion, especially for open-ended cells.