detection-methods

Tag

Cards List
#detection-methods

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

arXiv cs.CL · 2026-05-19 Cached

This paper reveals that much of the reported progress in LLM hallucination detection is due to benchmark construction artifacts, where ground-truth answers are embedded in prompts, allowing a simple text-similarity baseline to achieve near-perfect scores. Through a large-scale controlled evaluation, the authors show that most methods perform near chance under proper controls, except for supervised probes on upper-layer hidden states such as SAPLMA and their proposed DRIFT.

0 favorites 0 likes
#detection-methods

Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods

arXiv cs.LG · 2026-05-15 Cached

This paper investigates the resilience of AI-generated text detection methods (fine-tuned RoBERTa, Binoculars, text feature analysis, and ensembles) against paraphrasing attacks, finding that Binoculars-inclusive ensembles are most effective but also most vulnerable to attacks, highlighting a dichotomy between performance and resilience.

0 favorites 0 likes
← Back to home

Submit Feedback