ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Hugging Face Daily Papers Papers

Summary

ArtifactNet is a lightweight neural network framework that detects AI-generated music by analyzing codec-specific artifacts in audio signals, achieving F1=0.9829 on a new 6,183-track benchmark (ArtifactBench) with 49x fewer parameters than competing methods. The approach uses forensic physics principles to extract codec residuals through a bounded-mask UNet and compact CNN, with codec-aware training reducing cross-codec drift by 83%.

We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics -- extracting and analyzing the physical artifacts that neural audio codecs inevitably imprint on generated audio. A bounded-mask UNet (ArtifactUNet, 3.6M parameters) extracts codec residuals from magnitude spectrograms, which are then decomposed via HPSS into 7-channel forensic features for classification by a compact CNN (0.4M parameters; 4.0M total). We introduce ArtifactBench, a multi-generator evaluation benchmark comprising 6,183 tracks (4,383 AI from 22 generators and 1,800 real from 6 diverse sources). Each track is tagged with bench_origin for fair zero-shot evaluation. On the unseen test partition (n=2,263), ArtifactNet achieves F1 = 0.9829 with FPR = 1.49%, compared to CLAM (F1 = 0.7576, FPR = 69.26%) and SpecTTTra (F1 = 0.7713, FPR = 19.43%) evaluated under identical conditions with published checkpoints. Codec-aware training (4-way WAV/MP3/AAC/Opus augmentation) further reduces cross-codec probability drift by 83% (Delta = 0.95 -> 0.16), resolving the primary codec-invariance failure mode. These results establish forensic physics -- direct extraction of codec-level artifacts -- as a more generalizable and parameter-efficient paradigm for AI music detection than representation learning, using 49x fewer parameters than CLAM and 4.8x fewer than SpecTTTra.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:27 AM

Paper page - ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Source: https://huggingface.co/papers/2604.16254

Abstract

ArtifactNet uses a lightweight neural network framework to detect AI-generated music by analyzing codec-specific artifacts in audio signals, achieving superior performance compared to existing methods through codec-aware training and efficient architecture design.

We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics – extracting and analyzing the physical artifacts that neural audio codecs inevitably imprint on generated audio. A bounded-mask UNet (ArtifactUNet, 3.6M parameters) extracts codec residuals from magnitude spectrograms, which are then decomposed via HPSS into 7-channel forensic features for classification by a compact CNN (0.4M parameters; 4.0M total). We introduce ArtifactBench, a multi-generator evaluation benchmark comprising 6,183 tracks (4,383 AI from 22 generators and 1,800 real from 6 diverse sources). Each track is tagged with bench_origin for fair zero-shot evaluation. On the unseen test partition (n=2,263), ArtifactNet achieves F1 = 0.9829 with FPR = 1.49%, compared to CLAM (F1 = 0.7576, FPR = 69.26%) and SpecTTTra (F1 = 0.7713, FPR = 19.43%) evaluated under identical conditions with published checkpoints. Codec-aware training (4-way WAV/MP3/AAC/Opus augmentation) further reduces cross-codec probability drift by 83% (Delta = 0.95 → 0.16), resolving the primary codec-invariance failure mode. These results establish forensic physics – direct extraction of codec-level artifacts – as a more generalizable and parameter-efficient paradigm for AI music detection than representation learning, using 49x fewer parameters than CLAM and 4.8x fewer than SpecTTTra.

View arXiv page (https://arxiv.org/abs/2604.16254) View PDF (https://arxiv.org/pdf/2604.16254) Project page (https://demo.intrect.io/) Add to collection (https://huggingface.co/login?next=%2Fpapers%2F2604.16254)

Models citing this paper1

intrect/artifactnet Audio Classification• Updated about 5 hours ago (https://huggingface.co/intrect/artifactnet)

Datasets citing this paper1

intrect/artifactbench Viewer• Updated about 6 hours ago • 4.4k • 59 (https://huggingface.co/datasets/intrect/artifactbench)

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.16254 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to a collection (https://huggingface.co/new-collection) to link it from this page.

Similar Articles

MuseNet

OpenAI Blog

OpenAI released MuseNet, a deep neural network based on GPT-2 architecture that generates 4-minute musical compositions with 10 instruments by learning patterns from hundreds of thousands of MIDI files. The model can combine multiple music styles and blend them in novel ways.

Understanding the source of what we see and hear online

OpenAI Blog

OpenAI announces tools and research efforts to help verify content authenticity, including text watermarking, metadata approaches, and expanded image detection with C2PA metadata integration for tracking AI-generated and edited content.