Tag
OpenAI released BrowseComp, a benchmark of 1,266 challenging problems designed to measure AI agents' ability to locate hard-to-find information across the internet, available in their simple evals GitHub repository.