scientific-reproduction

Tag

Cards List
#scientific-reproduction

Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction

arXiv cs.LG · 2026-05-15 Cached

Collider-Bench is a new benchmark that evaluates LLM agents on reproducing particle physics analyses from the Large Hadron Collider using only public papers and open software, requiring physical reasoning to fill missing implementation details.

0 favorites 0 likes
#scientific-reproduction

DeepCode: Open Agentic Coding

Papers with Code Trending · 2025-12-08 Cached

DeepCode is a fully autonomous framework for document-to-codebase synthesis that uses principled information-flow management to convert scientific papers into production-grade code, achieving state-of-the-art results on PaperBench and surpassing PhD-level human experts.

0 favorites 0 likes
← Back to home

Submit Feedback