memorization

Tag

Cards List
#memorization

Mitigating Spurious Correlations with Memorization-Guided Dataset De-Biasing

arXiv cs.LG · yesterday Cached

The paper proposes a method to mitigate spurious correlations by disentangling learning dynamics of core and spurious features using a two-stage sample scoring function, achieving state-of-the-art debiasing performance with only 10% of training data.

0 favorites 0 likes
#memorization

Diffusion Models Preferentially Memorize Prototypical Examples or: Why Does My Diffusion Model Love Slop?

arXiv cs.LG · 3d ago Cached

This paper investigates memorization in diffusion models and finds that they preferentially memorize prototypical examples with common substrings, even after deduplication, and that early stopping leads to an overproduction of common motifs, dubbed 'slop'.

0 favorites 0 likes
#memorization

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

arXiv cs.LG · 3d ago Cached

This paper introduces NumLeak, a framework for detecting when foundation models memorize public numeric benchmarks from pretraining rather than demonstrating out-of-sample skill, and shows that top LLMs recall values like Fama-French returns with high fidelity, proposing a simple system-prompt defense.

0 favorites 0 likes
#memorization

Extracting Training Data from Diffusion Language Models via Infilling

arXiv cs.CL · 2026-05-26 Cached

This paper introduces infilling extraction, a new method for extracting training data from diffusion language models by using arbitrary binary masks, showing that such models are more vulnerable to memorization attacks than previously thought.

0 favorites 0 likes
#memorization

A mathematical theory of balancing relational generalization and memorization

arXiv cs.LG · 2026-05-25 Cached

This paper introduces a novel task, transitive inference with exceptions, and analytically characterizes how neural network models (kernel ridge regression) balance relational generalization and memorization. The theory is validated in pretrained language models, showing systematic mistakes predicted by the theory.

0 favorites 0 likes
#memorization

Memorization Dynamics of Fill-in-the-Middle Pretraining

arXiv cs.CL · 2026-05-25 Cached

This paper studies how fill-in-the-middle (FIM) pretraining affects verbatim memorization, finding that FIM more often recovers short spans while standard left-to-right training recovers long exact continuations, and that memorization under FIM grows linearly with repetitions.

0 favorites 0 likes
#memorization

The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation

Hugging Face Daily Papers · 2026-05-21 Cached

This paper introduces Zero-CoT Probe (ZCP), a black-box detection method that identifies evasive data contamination in LLMs by truncating chain-of-thought reasoning and comparing performance on perturbed datasets, achieving robust detection of both direct and indirect contamination.

0 favorites 0 likes
#memorization

Vocabi

Product Hunt · 2026-05-16

Vocabi is a tool that helps users translate, save, and memorize words while they read.

0 favorites 0 likes
← Back to home

Submit Feedback