pdf-parsing

#pdf-parsing

@llama_index: Most agentic retrieval demos assume clean, well-structured documents. Enterprise reality is often different, consisting…

X AI KOLs Following ↗ · 2026-07-06 Cached

LlamaIndex and LanceDB collaborated on a pipeline using LiteParse for PDF parsing and LanceDB for multimodal storage, enabling better retrieval from complex enterprise PDFs for agentic workflows.

0 favorites 0 likes

#pdf-parsing

@BlockInsight214: Before feeding papers, contracts, or scanned documents to AI, the hardest step is often "cleaning up the PDF." These open-source projects specialize in that: converting to Markdown/JSON, ready for RAG or agents. ① MarkItDown · Microsoft, Office/PDF/images to Markdown in one click...

X AI KOLs Timeline ↗ · 2026-06-22 Cached

Introduces five open-source tools (MarkItDown, MinerU, Docling, marker, surya) that convert PDFs, Office documents, etc., into Markdown or JSON for direct use with RAG or AI agents.

0 favorites 0 likes

#pdf-parsing

@itsafiz: Built a super fast PDF parsing service with LiteParse! LiteParse is a standalone OSS PDF parsing tool by @llama_index f…

X AI KOLs Following ↗ · 2026-06-21 Cached

Built a fast PDF parsing service using LiteParse, an open-source tool by LlamaIndex, with help from Cursor AI.

0 favorites 0 likes

#pdf-parsing

@jerryjliu0: It's kind of crazy how well LiteParse does on markdown document parsing even compared against frontier VLMs - when it d…

X AI KOLs Following ↗ · 2026-06-19 Cached

LiteParse is a fast, open-source document parser that outperforms some frontier VLMs on markdown parsing without using AI models. It is available in multiple languages and platforms, and is developed by LlamaIndex.

0 favorites 0 likes

#pdf-parsing

@jerryjliu0: We made Claude better and faster at understanding PDFs The trick isn’t just creating the fastest free document parser o…

X AI KOLs Following ↗ · 2026-06-17 Cached

LlamaIndex improved their LiteParse PDF parsing skill for Claude agents, making it 37% cheaper and more accurate by optimizing agent behavior through evaluation traces.

0 favorites 0 likes

#pdf-parsing

@llama_index: How much can good documentation save an AI agent in cost and time? Turns out, a lot. We built a custom skill that teach…

X AI KOLs Following ↗ · 2026-06-16 Cached

LlamaIndex's blog post describes building a custom LiteParse skill for Claude agents that reduced cost per question by 37% and improved answer quality by analyzing agent traces to fix inefficiencies in PDF parsing.

0 favorites 0 likes

#pdf-parsing

@jerryjliu0: LiteParse, our open-source/Rust-based doc parser, runs so quickly that Claude Fable 5 doesn't think it's real It is the…

X AI KOLs Following ↗ · 2026-06-09 Cached

LiteParse is a fast, open-source document parser written in Rust that provides high-quality spatial text extraction with bounding boxes, supporting multiple languages and platforms for AI document workloads.

0 favorites 0 likes

#pdf-parsing

@llama_index: Most AI pipelines are only as good as the data we provide them with, and that usually means PDFs or other unstructured …

X AI KOLs Timeline ↗ · 2026-06-04 Cached

Parse-Flow is an open-source visual workflow designer built by LlamaIndex that chains four document processing primitives—Parse, Classify, Split, and Extract—into a drag-and-drop canvas powered by LlamaAgents workflows, enabling reliable structured data extraction from unstructured enterprise documents like PDFs, contracts, and invoices.

0 favorites 0 likes

#pdf-parsing

@jerryjliu0: We Parse PDFs We spent 7 figures to put this on billboards throughout SF. I thought long and hard about putting somethi…

X AI KOLs Following ↗ · 2026-06-02 Cached

Jerry Liu of LlamaIndex announces a $1M+ billboard campaign in SF promoting their PDF parsing service for AI agents, and lists their booth appearances at upcoming tech conferences.

0 favorites 0 likes

#pdf-parsing

@llama_index: Automate a loan underwriting pipeline in just a few lines of code A typical loan file is a stack of pay stubs and broke…

X AI KOLs Following ↗ · 2026-05-26 Cached

LlamaIndex demonstrates how to automate a loan underwriting pipeline using LlamaParse to extract structured data from financial PDFs, with cross-document analysis and human-in-the-loop review.

0 favorites 0 likes

#pdf-parsing

@rwayne: Absolutely impressive for building local knowledge bases with academic papers—the bottleneck has always been cleanly converting PDFs to Markdown. OpenDataLoader-PDF achieves a 0.907 accuracy rate, ranking first on the open-source PDF parsing leaderboard, all under Apache 2.0. Key metrics from a test set of 200 real papers: Overall score 0…

X AI KOLs Timeline ↗ · 2026-05-10

OpenDataLoader-PDF is an open-source PDF parsing tool that achieves a high accuracy rate of 0.907 in tests with real academic papers. It efficiently converts complex PDF documents (including tables, formulas, and scanned images) into Markdown and JSON, making it ideal for local knowledge bases and RAG applications.

0 favorites 0 likes

#pdf-parsing

@AIExplorerTim: Someone just released a tool that converts PDFs into clean, structured Markdown at speeds up to 100 pages/second. No GPU required. No API costs. No messy parsing. Just raw, usable data. It handles with ease: • Tables → Perfectly ex…

X AI KOLs Timeline ↗ · 2026-05-09 Cached

OpenDataLoader is an open-source tool that converts PDFs into structured Markdown and JSON, supporting local processing speeds of up to 100 pages/second without requiring a GPU or incurring API costs, designed specifically for RAG pipelines and PDF accessibility automation.

0 favorites 0 likes

#pdf-parsing

@jerryjliu0: A downside with using VLMs to parse PDFs is guaranteeing that the output text is correct and output in the correct re…

X AI KOLs Following ↗ · 2026-04-18 Cached

Jerry Liu discusses challenges with using Vision Language Models for PDF parsing, particularly around ensuring text correctness and maintaining proper reading order while avoiding hallucinations.

0 favorites 0 likes

#pdf-parsing

PaddlePaddle/PaddleOCR

GitHub Trending (daily) ↗ · 2026-06-05

PaddleOCR is a powerful, lightweight OCR toolkit that converts PDFs and images into structured data for AI applications, supporting 100+ languages and designed to bridge documents with LLMs.

0 favorites 0 likes

#pdf-parsing

run-llama/liteparse

GitHub Trending (daily) ↗ · 2026-05-29 Cached

LiteParse is a standalone open-source PDF parsing tool from run-llama that provides fast, local spatial text extraction with bounding boxes, supporting multiple programming languages and platforms.

0 favorites 0 likes

pdf-parsing

Submit Feedback