Can liveness detection models generalise to synthetic media generation techniques they were never trained on? [D]

Reddit r/MachineLearning Papers

Summary

This discussion examines whether liveness detection models trained on historical deepfake samples can generalize to new synthetic media generation techniques, questioning the update cycle for vendors claiming deepfake detection capabilities.

Most liveness detection systems in production today were built around a threat model where the attacker is submitting a static image or a basic replay video. The generation quality of current synthetic media is categorically different from what those training datasets captured. The question I keep coming back to is whether a model trained on historical deepfake samples can generalise to generation techniques that did not exist when the training data was assembled. And if the answer is no, what does the update cycle look like for vendors claiming deepfake detection as a core capability. I asked two identity verification vendors this directly and got answers that sounded confident without addressing the temporal gap between training data and current generation quality.
Original Article

Similar Articles

The Alpha Blending Hypothesis: Compositing Shortcut in Deepfake Detection

Hugging Face Daily Papers

This paper introduces the Alpha Blending Hypothesis, suggesting deepfake detectors primarily identify compositing artifacts rather than semantic anomalies. It proposes a method called BlenD that achieves superior cross-dataset generalization using real-only image augmentation with self-blended images.

What matters when synthetic training data is generated on demand?

Reddit r/ArtificialInteligence

Abliteration launches a made-to-order synthetic training data workflow that generates negative, rare, and adversarial examples for classifiers, with schema, real-world facts, labels, provenance, and export to platforms like Hugging Face.

AI is deteriorating in realtime

Reddit r/ArtificialInteligence

AI models are deteriorating due to training on recursively generated synthetic data, leading to model collapse; multiple studies highlight the risks of scaling with synthetic data.