nlp-research

#nlp-research

Characterizing Narrative Content in Web-scale LLM Pretraining Data

Hugging Face Daily Papers ↗ · 2026-06-17 Cached

A fine-grained study of narrative features in web-scale LLM pretraining data, introducing NarraBERT and NarraDolma to measure narrative patterns and their distribution across sources.

0 favorites 0 likes

#nlp-research

The First Token Knows: Single-Decode Confidence for Hallucination Detection

Hugging Face Daily Papers ↗ · 2026-05-06 Cached

This paper introduces a method for detecting hallucinations in large language models by leveraging the confidence of the first generated token, requiring only a single decode step.

0 favorites 0 likes

nlp-research

Characterizing Narrative Content in Web-scale LLM Pretraining Data

The First Token Knows: Single-Decode Confidence for Hallucination Detection

Submit Feedback