Wrote up the failure modes that kept breaking my RAG system: chunking, stale index, hybrid search, the works

Reddit r/ArtificialInteligence Tools

Summary

A developer shares the failure modes encountered while debugging a RAG system, including issues with chunking, stale indices, and hybrid search, along with practical fixes like sliding window chunking and contextual retrieval.

So, after spending way too long debugging a RAG system that kept giving confidently wrong answers, I finally sat down and actually mapped out every place it was breaking. Turns out most of my problems came down to chunking, which I had genuinely underestimated. I was doing fixed-size splitting and not thinking about it much. The issues: Chunks too small, no context survives. retrieved "refunds processed in 5 days" with zero surrounding information. The LLM answered but missed all the nuance that was in the sentences around it. Chunks too large, right section retrieved but the actual answer was buried under so much irrelevant text that quality tanked and costs went up. Switched to sliding window with overlap and things got noticeably better. semantic chunking gave the best results but the cost per indexing run went up so I only use it for the most important documents. Other things that got me: Stale index is sneaky, docs were getting updated but I hadn't set up automatic re-indexing. old information kept getting retrieved and I couldn't figure out why answers were drifting. Semantic search completely fails on exact strings. product codes, model numbers, specific IDs. had to add keyword search alongside semantic and merge the results. obvious in hindsight but I didn't think about it until users started complaining. LLM hallucinates from the closest chunk even when the answer isn't in your docs. had to be very explicit in the system prompt, if the answer isn't in the retrieved context, say you don't know. without that instruction it just riffs off whatever it found. The thing that helped most beyond chunking was contextual retrieval, passing each chunk alongside the full document when generating its context prefix rather than just summarizing the chunk alone. makes a meaningful difference on longer documents because the chunk carries its location and purpose with it. Anyway, curious if others have hit these same things or found different fixes, especially on the stale index problem. My current solution feels a bit janky.
Original Article

Similar Articles

Most agent RAG problems I see are retrieval problems, not model problems

Reddit r/AI_Agents

The author argues that most agent RAG failures are due to retrieval problems—specifically chunking errors, lack of freshness signals, and reliance on pure vector search—rather than the LLM, and recommends structural chunking, decay-based ranking, and hybrid BM25+vector search.

Most RAG apps in production are confidently wrong and nobody talks about this enough

Reddit r/ArtificialInteligence

The article highlights a critical failure mode in production RAG systems where confident but incorrect answers arise from versioning issues and lack of uncertainty mechanisms. It proposes architectural improvements like routing layers, retrieval scoring, and hallucination checks to mitigate these errors.

CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents

arXiv cs.CL

CHOP is a framework for improving RAG systems on multi-document retrieval by using context-aware metadata and LLM-based chunk relevance evaluation to reduce semantic conflicts and hallucinations. The approach achieves 90.77% Top-1 Hit Rate through intelligent chunking and contextual preservation strategies.