pdf-parsing

#pdf-parsing

@llama_index: Most AI pipelines are only as good as the data we provide them with, and that usually means PDFs or other unstructured …

X AI KOLs Timeline ↗ · 11h ago Cached

Parse-Flow is an open-source visual workflow designer built by LlamaIndex that chains four document processing primitives—Parse, Classify, Split, and Extract—into a drag-and-drop canvas powered by LlamaAgents workflows, enabling reliable structured data extraction from unstructured enterprise documents like PDFs, contracts, and invoices.

0 favorites 0 likes

#pdf-parsing

@jerryjliu0: We Parse PDFs We spent 7 figures to put this on billboards throughout SF. I thought long and hard about putting somethi…

X AI KOLs Following ↗ · 2d ago Cached

Jerry Liu of LlamaIndex announces a $1M+ billboard campaign in SF promoting their PDF parsing service for AI agents, and lists their booth appearances at upcoming tech conferences.

0 favorites 0 likes

#pdf-parsing

@llama_index: Automate a loan underwriting pipeline in just a few lines of code A typical loan file is a stack of pay stubs and broke…

X AI KOLs Following ↗ · 2026-05-26 Cached

LlamaIndex demonstrates how to automate a loan underwriting pipeline using LlamaParse to extract structured data from financial PDFs, with cross-document analysis and human-in-the-loop review.

0 favorites 0 likes

#pdf-parsing

@rwayne: Absolutely impressive for building local knowledge bases with academic papers—the bottleneck has always been cleanly converting PDFs to Markdown. OpenDataLoader-PDF achieves a 0.907 accuracy rate, ranking first on the open-source PDF parsing leaderboard, all under Apache 2.0. Key metrics from a test set of 200 real papers: Overall score 0…

X AI KOLs Timeline ↗ · 2026-05-10

OpenDataLoader-PDF is an open-source PDF parsing tool that achieves a high accuracy rate of 0.907 in tests with real academic papers. It efficiently converts complex PDF documents (including tables, formulas, and scanned images) into Markdown and JSON, making it ideal for local knowledge bases and RAG applications.

0 favorites 0 likes

#pdf-parsing

@AIExplorerTim: Someone just released a tool that converts PDFs into clean, structured Markdown at speeds up to 100 pages/second. No GPU required. No API costs. No messy parsing. Just raw, usable data. It handles with ease: • Tables → Perfectly ex…

X AI KOLs Timeline ↗ · 2026-05-09 Cached

OpenDataLoader is an open-source tool that converts PDFs into structured Markdown and JSON, supporting local processing speeds of up to 100 pages/second without requiring a GPU or incurring API costs, designed specifically for RAG pipelines and PDF accessibility automation.

0 favorites 0 likes

#pdf-parsing

@jerryjliu0: A downside with using VLMs to parse PDFs is guaranteeing that the output text is correct and output in the correct re…

X AI KOLs Following ↗ · 2026-04-18 Cached

Jerry Liu discusses challenges with using Vision Language Models for PDF parsing, particularly around ensuring text correctness and maintaining proper reading order while avoiding hallucinations.

0 favorites 0 likes

#pdf-parsing

PaddlePaddle/PaddleOCR

GitHub Trending (daily) ↗ · 1h ago

PaddleOCR is a powerful, lightweight OCR toolkit that converts PDFs and images into structured data for AI applications, supporting 100+ languages and designed to bridge documents with LLMs.

0 favorites 0 likes

#pdf-parsing

run-llama/liteparse

GitHub Trending (daily) ↗ · 6d ago Cached

LiteParse is a standalone open-source PDF parsing tool from run-llama that provides fast, local spatial text extraction with bounding boxes, supporting multiple programming languages and platforms.

0 favorites 0 likes

pdf-parsing

Submit Feedback