PreScam: A Benchmark for Predicting Scam Progression from Early Conversations
Summary
PreScam is a benchmark for modeling scam progression in multi-turn conversations, built from real-world scam reports. It includes tasks like real-time termination prediction and scammer action prediction, finding that supervised encoders outperform zero-shot LLMs.
View Cached Full Text
Cached at: 05/15/26, 04:26 PM
Paper page - PreScam: A Benchmark for Predicting Scam Progression from Early Conversations
Source: https://huggingface.co/papers/2605.12243
Abstract
PreScam benchmark enables modeling of scam progression through multi-turn conversations by structuring real-world reports according to a scam kill chain and annotating psychological actions and victim responses.
Conversational scams, such as romance and investment scams, are emerging as a major form of online fraud. Unlike one-shot scam lures such as fake lottery or unpaid toll messages, they unfold through multi-turn conversations in which scammers gradually manipulate victims using evolving psychological techniques. However, existing research mainly focuses on static scam detection or synthetic scams, leaving open whether language models can understand how real-world scams progress over time. We introduce PreScam, a benchmark for modelingscam progressionfrom early conversations. Built from user-submitted scam reports, PreScam filters and structures 177,989 raw reports into 11,573 conversational scam instances spanning 20 scam categories. Each instance is hierarchically structured according to the scam lifecycle defined by the proposedscam kill chain, and further annotated at the turn level with scammerpsychological actionsandvictim responses. We benchmark models on two tasks:real-time termination prediction, which estimates whether a conversation is approaching the termination stage, and scammer action prediction, which forecasts the scammer’s subsequent actions. Results show a clear gap between surface-level fluency and progression modeling:supervised encoderssubstantially outperformzero-shot LLMsonreal-time termination prediction, whilenext-action predictionremains only moderately successful even forstrong LLMs. Taken together, these results show that current models can capture some scam-related cues, yet still struggle to track how risk escalates and how manipulation unfolds across turns.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.12243
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.12243 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.12243 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.12243 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
ORACLE: Anticipating Scams from Partial Trajectories in Streaming App Usage
ORACLE is a new agentic framework for early scam anticipation from streaming app usage trajectories. It uses a self-evolving context manager and on-policy self-distillation to detect scams from partial observations over multiple apps and days.
PromptScout
PromptScout is a tool that tracks how brands are mentioned across various AI models, helping businesses monitor their visibility.
An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection
This paper presents COVA-X, an expanded synthetic multi-turn conversation dataset for smishing detection, and shows that Longformer now outperforms XGBoost, confirming that transformer models benefit from larger training corpora.
Been watching real adversarial input hit my detection API for six months. Here's what's actually landing.
A six-month analysis of real adversarial inputs reveals that simple multi-turn setups, forward-momentum exploitation, and role redefinition attacks consistently bypass single-message classifiers. The post argues that stateful monitoring of conversational context is more effective than improving one-shot detection.
@HowToAI_: Microsoft Research + Salesforce has published a paper that should scare every single AI builder right now. It’s called …
A new paper by Microsoft Research and Salesforce reveals that LLM performance drops significantly in multi-turn conversations due to a 'Lost in Conversation' phenomenon, challenging the reliability of current single-turn benchmarks.