I made a small tool to inspect retrieval results before feeding them into RAG

Reddit r/LocalLLaMA 05/27/26, 07:16 AM Tools

rag retrieval-augmented-generation search-quality tool web-retrieval inspection open-source

Summary

A developer created a small local tool for inspecting retrieval results from search providers like Brave, Serper, Tavily, and Exa before feeding them into a RAG pipeline, checking signals such as source diversity, duplicates, freshness, and SEO/GEO pollution risk.

I’ve been messing around with live web retrieval for RAG, and the part that kept annoying me wasn’t the search call itself. It was figuring out whether the returned results were actually usable as evidence. A result can look relevant, but still be stale, duplicated, SEO-heavy, or just not good enough to put into the context window. So I cleaned up a small local tool for inspecting retrieval/search results before feeding them into a RAG pipeline: [https://github.com/mameirolabs/rag-search-quality-lab-public](https://github.com/mameirolabs/rag-search-quality-lab-public) It currently supports mock, Brave, Serper, Tavily, and Exa. It looks at rough signals like source diversity, duplicates, freshness, citation readiness, SEO/GEO pollution risk, and provider differences. Not trying to make a benchmark or declare which provider is “best”. The scoring is still very rough. I mostly use it to compare outputs side by side and spot bad evidence before it reaches the model. Curious how others handle this: What signals do you check before trusting retrieved web results in a RAG pipeline?

Original Article

Similar Articles

Which Web Search API gives the cleanest Markdown output for local RAG parsing?

Reddit r/LocalLLaMA

A comparison of web search APIs and tools that provide clean Markdown output for grounding local RAG pipelines, evaluating Brave Search, Parallel AI, You.com, Exa, Tavily, Firecrawl, Jina Reader, and SearXNG on signal-to-noise ratio and developer overhead.

@h100envy: This paper completely changed how I think about trusting retrieval in RAG: Fetch documents -> Score their quality -> Ge…

X AI KOLs Timeline

This paper presents a 5-step blueprint for improving trust in RAG by using a lightweight retrieval evaluator that scores document quality and triggers actions (correct, incorrect, ambiguous) to handle retrieval failures, with plug-and-play integration.

Evaluating RAG Metrics in Applied Contexts: An Experiment, Its Findings and Its Limitations

arXiv cs.CL

This paper presents an empirical study evaluating RAG evaluation metrics from four libraries (Ragas, DeepEval, RAGChecker, Opik) by comparing them to human judgments and standard recall metrics, using a question-answering dataset created from business data.

@akshay_pachaar: Web scraping will never be the same. (100% open-source visual search at scale) PixelRAG is a retrieval system that skip…

X AI KOLs Following

PixelRAG is an open-source retrieval system that bypasses HTML parsing by screenshotting web pages and using vision-language models to read answers directly from pixels, claiming significant accuracy improvements over text-based RAG.

@DanKornas: Your RAG pipeline doesn’t need to retrieve the same evidence twice. LeanRAG is an open-source RAG framework that uses k…