centrality

Tag

Cards List
#centrality

Hubs or Fringes: Pretraining Data Selection via Web Graph Centrality

arXiv cs.CL · 2026-06-11 Cached

This paper introduces WebGraphMix, a lightweight framework that uses web graph centrality scores from Common Crawl to select pretraining data, showing that mixing central and peripheral documents improves language model performance.

0 favorites 0 likes
← Back to home

Submit Feedback