NeurIPS used uncalibrated AI detector for desk rejections [D]

Reddit r/MachineLearning 06/03/26, 05:28 PM News

neurips ai-detector desk-rejection ai-policy false-positive conference methodology

Summary

A submission was desk-rejected from NeurIPS based on an uncalibrated AI detector (Pangram), raising concerns about circularity in the review process and unvalidated false-positive rates on the target distribution.

I recently had a submission desk-rejected from the NeurIPS 2026 Position Paper Track for an alleged AI-policy violation. After corresponding with the track leadership and reading their public blog post, I think the broader methodological issue is worth discussing here. The track used Pangram, a proprietary AI-text detector, as part of the desk-rejection process. I was told that the materials considered for desk rejection were: * the detector output * the authors’ AI-use attestation This creates a potential circularity problem. If a high detector score is used to judge the author’s attestation as inconsistent, and that inconsistency is then used to justify desk rejection, the detector is not just an aid. It becomes a decisive part of the adjudication process. The bigger issue is validation. The NeurIPS blog describes tests using Pangram audits, older ACM FAccT papers, synthetic AI-generated position papers, and manually edited samples. But the target population was NeurIPS 2026 Position Paper submissions, whose ground-truth authorship process is unknown. So the key question is: **What is the false-positive rate of the final decision procedure on the actual target distribution?** A false-positive rate measured on one distribution does not automatically transfer to another. If the actual submission pool produced a "surprisingly high flagged rate" (citation from NeurIPS blog post), that could indicate distribution shift / miscalibration. To sanity-check the detector’s behavior, I also ran Pangram on recent 2026 papers authored by NeurIPS Position Paper Track Chairs. Pangram returned scores including: * 69% AI * 45% AI * 36% AI * 24% AI I am **not** claiming those papers were AI-written. For me, Pangram’s outputs alone does not permit such a conclusion. And that is exactly the point.

Original Article

NeurIPS used uncalibrated AI detector for desk rejections [D]

Similar Articles

Top AI conference uses AI detector to reject papers for allegedly being written by AI

NeurIPS 2026 AI-generated reviews [D]

NeurIPS 2026 Reviewer: AI-Generated Rebuttals (and Paper) [D]

Base Models Look Human To AI Detectors

How accurate are AI checkers?

Submit Feedback

Similar Articles

Top AI conference uses AI detector to reject papers for allegedly being written by AI

NeurIPS 2026 AI-generated reviews [D]

NeurIPS 2026 Reviewer: AI-Generated Rebuttals (and Paper) [D]

Base Models Look Human To AI Detectors