Built a Fetch API that returns page labels, not just markdown

Reddit r/AI_Agents 05/19/26, 03:33 AM Products

fetch-api rag web-ingestion page-labels content-classification developer-tools

Summary

The author introduces a Fetch API for RAG and web ingestion that returns page labels (dead link, content category, page structure) to help filter low-value pages before indexing. They seek feedback on what additional fields would be useful.

I'm working on a Fetch API for RAG, agents, and web ingestion workflows. Think Firecrawl/Jina Reader-style URL-to-markdown or clean-text API, but with one extra signal layer: page labels for content category and page structure. The pain point: fetching is only the first step. You still need to decide whether a page is useful, relevant, and worth sending into indexing, embedding, or an LLM pipeline. Examples of labels we return: dead link / main content missing → skip low-value pages early homepage / index page vs content page → avoid mixing navigation/listing pages with real content content category → keep vertical pipelines from indexing out-of-scope pages, e.g. a finance workflow pulling in random entertainment/forum pages Our category labels cover broad areas like Finance, Health, News, Ecommerce, Education, Jobs, Travel, and more. A couple of open questions: If you've already built filtering logic on top of a fetch API — skipping listing pages, filtering by topic, dropping dead links — curious what that looks like in your pipeline. Does moving this upstream actually save work, or just add a layer you'd rather control yourself? Beyond category and page structure, what other fields or labels would actually be useful in a fetch API response? Author, publish date, sentiment, product pricing, freshness signals...? Curious what's missing from current fetch tools for your pipeline. Happy to share access if you want to try it. New signups get $5 credit, around 5k pages.

Original Article

Built a Fetch API that returns page labels, not just markdown

Similar Articles

Which Web Search API gives the cleanest Markdown output for local RAG parsing?

I made a small tool to inspect retrieval results before feeding them into RAG

@h100envy: This paper completely changed how I think about trusting retrieval in RAG: Fetch documents -> Score their quality -> Ge…

How we index images for RAG

[P] I built a system that lets you ask questions about any GitHub repo and get answers grounded in the actual source code [P]

Submit Feedback

Similar Articles

Which Web Search API gives the cleanest Markdown output for local RAG parsing?

I made a small tool to inspect retrieval results before feeding them into RAG

@h100envy: This paper completely changed how I think about trusting retrieval in RAG: Fetch documents -> Score their quality -> Ge…

[P] I built a system that lets you ask questions about any GitHub repo and get answers grounded in the actual source code [P]