Your RAG is hallucinating because of garbage retrieval — here's the 3-line fix (with real scores)

Reddit r/AI_Agents Tools

Summary

A practical fix for RAG hallucination caused by noisy retrieval: use cross-encoder re-ranking to filter chunks with a score > 1.5, improving relevance from -0.28 to +3.80 on average.

My RAG agent hallucinated. Not because the LLM was bad — because the retrieval was feeding it noise. Query: "What are Python decorators?" What my retriever returned (before fix): | Rank | Score | Content | Relevant? | |---|---|---|---| | 1 | +5.80 | Decorator definition | Yes | | 2 | +1.40 | Acknowledgments page | No | | 3 | +1.13 | u/staticmethod example | Yes | | 4 | -4.69 | Class exercises | No | | 5 | -11.0 | Monty Python reference | No | The LLM received all 5 chunks. It hallucinated because it trusted the noise. The fix — cross-encoder re-ranking (3 lines): scores = cross\_encoder.score(pairs) ranked = sorted(zip(scores, candidates), reverse=True) filtered = \[doc for score, doc in ranked if score > 1.5\] After fix: only chunks with score > 1.5 reach the LLM. Overall results (10 queries): avg relevance went from -0.28 to +3.80. 80% win rate. Model: cross-encoder/ms-marco-MiniLM-L-6-v2 (free, local, HuggingFace). If your chatbot hallucinates, check your retrieval before blaming the LLM. What threshold are you using for your re-ranker?
Original Article

Similar Articles

RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

arXiv cs.CL

RAGognizer introduces a hallucination-aware fine-tuning approach that integrates a lightweight detection head into LLMs for joint optimization of language modeling and hallucination detection in RAG systems. The paper presents RAGognize, a dataset of naturally occurring closed-domain hallucinations with token-level annotations, and demonstrates state-of-the-art hallucination detection while reducing hallucination rates without degrading language quality.

Most agent RAG problems I see are retrieval problems, not model problems

Reddit r/AI_Agents

The author argues that most agent RAG failures are due to retrieval problems—specifically chunking errors, lack of freshness signals, and reliance on pure vector search—rather than the LLM, and recommends structural chunking, decay-based ranking, and hybrid BM25+vector search.

@vintcessun: Feeding too many documents into RAG causes retrieval quality to drop from 75% to 40%? Vector search is diluted by a large amount of irrelevant content, causing a sharp drop in hit rate in real deployment. Root cause: heterogeneous documents are retrieved together, noise drowns out signal. Multi-agent orchestration seems intelligent but actually introduces a precision-fidelity paradox—poor configuration leads to failure in both aspects. The paper proposes MA…

X AI KOLs Timeline

This paper identifies 'vector search dilution' in RAG systems when scaling to large heterogeneous document collections, where accuracy dropped from 75% to 40% in a real-world deployment. The proposed MASDR-RAG method uses domain scoping via organizational metadata before retrieval, improving P@10 from 0.77 to 0.86 with low cost and easy deployment.

When Retrieval Doesn't Help: A Large-Scale Study of Biomedical RAG

arXiv cs.CL

A large-scale study across 5 models (7B–72B), 10 biomedical QA datasets, 4 retrieval methods, and 4 corpora finds that RAG yields only small and inconsistent gains (1–2 points) over no-retrieval baselines in biomedical question answering. The study concludes that the main bottleneck is not retrieval quality but models' limited ability to effectively use retrieved evidence.