attention-softmax

#attention-softmax

Can Language Models Actually Retrieve In-Context? Drowning in Documents at Million Token Scale

arXiv cs.CL ↗ · 2d ago Cached

This paper systematically studies in-context retrieval at million-token scale, introducing BlockSearch, a 0.6B LM retriever, and analyzing attention dilution. The model matches or outperforms dense retrieval on benchmarks like MS MARCO and NQ, and significantly outperforms on tasks requiring different similarity notions, highlighting the potential of in-context retrieval while emphasizing attention control under extreme context growth.

0 favorites 0 likes

attention-softmax

Can Language Models Actually Retrieve In-Context? Drowning in Documents at Million Token Scale

Submit Feedback