标签
This paper identifies 'vector search dilution' in RAG systems when scaling to large heterogeneous document collections, where accuracy dropped from 75% to 40% in a real-world deployment. The proposed MASDR-RAG method uses domain scoping via organizational metadata before retrieval, improving P@10 from 0.77 to 0.86 with low cost and easy deployment.
本文识别了RAG系统在扩展到大规模异构文档集合时出现的“向量搜索稀释”现象,并提出MASDR-RAG,一种利用组织元数据进行领域限定的检索方法,显著提升了检索准确率。