@NielsRogge: New feature on http://paperswithcode.co: support for closed-source evals Given the new Microsoft tech report is so dens…
Summary
Papers With Code announced a new feature supporting closed-source evaluations, using the dense Microsoft MAI Thinking 1 tech report as an example with a special 'closed' tag.
View Cached Full Text
Cached at: 06/03/26, 11:56 PM
New feature on http://paperswithcode.co: support for closed-source evals
Given the new Microsoft tech report is so dense, it’s now available here: https://paperswithcode.co/paper/90076.
Its evals have a special “closed” tag, indicating they are closed-source. You can enable/disable this in your PwC settings
elie (@eliebakouch): WOW microsoft new “MAI Thinking 1” model comes with a 109 page tech report that looks REALLY detailed, this is amazing
Similar Articles
PapersWithCode new features - week 1 [P]
Niels from Hugging Face announces new features for the revived PapersWithCode platform, including multi-metric leaderboards, support for external papers, paper lineage, and more.
@NielsRogge: Introducing a revival of PapersWithCode! As @ilyasut said, we're back to the "age of research". Hence, it's important t…
NielsRogge announces a revival of PapersWithCode, featuring SOTA per domain, leaderboards, and methods parsed at scale using AI agents.
Reviving PapersWithCode (by Hugging Face) [P]
Niels from Hugging Face announces the revival of PapersWithCode as paperswithcode.co, a platform that parses high-impact AI papers at scale and automatically generates leaderboards and benchmarks, incorporating features like trending papers, domain categorization, and external paper support.
@HowToAI_: This might be the most unreal academic-writing upgrade I’ve ever seen. A team from NUS open-sourced PaperDebugger, a in…
A team from NUS open-sourced PaperDebugger, a multi-agent system that lives inside Overleaf, providing real-time rewriting, critique, and citation assistance with an open enhancer model (XtraGPT-7B), making Overleaf a full research environment.
@KLieret: You can evaluate on ProgramBench yourself: https://github.com/facebookresearch/ProgramBench/… We will open the leaderbo…
ProgramBench is a new benchmark that tests AI agents' ability to reconstruct a complete codebase from a compiled binary and its documentation. The leaderboard will open for submissions soon.