Tag
This paper reveals that commercial AI detectors like GPTZero and Pangram judge text from base language models as overwhelmingly human, while instruction-tuned model outputs are flagged as AI-generated. The authors propose HIP, a detector-agnostic iterative paraphrasing pipeline that improves human-likeness while preserving semantics.
A research paper finds that base language models appear human to AI detectors, unlike instruction-tuned models. The authors propose a paraphrasing pipeline (HIP) that improves human-likeness while preserving semantics across model sizes.