How are you handling aggregation/counting questions in doc-aware agents? RAG keeps failing me here

Reddit r/AI_Agents News

Summary

A developer discusses the limitations of RAG for aggregation and counting queries over collections of documents, and asks for community advice on alternative approaches like text-to-SQL and intent routing.

Something I keep hitting building agents that work over documents, curious how others solve it. RAG is the default doc tool we give agents, and it's great for "find/explain the passage about X" — the answer lives in one place, retrieval finds it. But the questions users actually ask are often aggregations over the whole collection: "how many invoices are unpaid", "total billed to this client this year", "which contracts expire in the next 90 days". Top-k retrieval structurally can't answer those: * Aggregation is a scan over every record; retrieval hands the agent k chunks. So it computes the answer over a sample, not the population. Ask for a total over 1,000 docs with k=10 and it literally sums 10 of them — then states it with full confidence. * On homogeneous collections the ranking is meaningless anyway: 1,000 invoices are all roughly equidistant from "total unpaid", so which k come back is basically arbitrary. * Raising k doesn't fix it — to be correct k must equal N, i.e. dump the whole corpus into context. Doesn't scale, and a bigger/better model doesn't help because the ceiling is set at the retrieval step before the model sees anything. The approaches I've seen / tried: (a) function-calling to text-to-SQL over a structured table, (b) pre-extracting fields into a metadata store and querying that, (c) just accepting RAG is the wrong tool for these and routing aggregation intents elsewhere. For those of you shipping doc agents in production: what's your actual pattern for counting/aggregation? Do you classify the query intent first and route, or have you found a retrieval setup that genuinely handles it? (Sorry for AI written text but it's only beacause i make mistakes in English)
Original Article

Similar Articles

Most agent RAG problems I see are retrieval problems, not model problems

Reddit r/AI_Agents

The author argues that most agent RAG failures are due to retrieval problems—specifically chunking errors, lack of freshness signals, and reliance on pure vector search—rather than the LLM, and recommends structural chunking, decay-based ranking, and hybrid BM25+vector search.

Most RAG apps in production are confidently wrong and nobody talks about this enough

Reddit r/ArtificialInteligence

The article highlights a critical failure mode in production RAG systems where confident but incorrect answers arise from versioning issues and lack of uncertainty mechanisms. It proposes architectural improvements like routing layers, retrieval scoring, and hallucination checks to mitigate these errors.