Tag
Parse-Flow is an open-source visual workflow designer built by LlamaIndex that chains four document processing primitives—Parse, Classify, Split, and Extract—into a drag-and-drop canvas powered by LlamaAgents workflows, enabling reliable structured data extraction from unstructured enterprise documents like PDFs, contracts, and invoices.
Jerry Liu of LlamaIndex announces a $1M+ billboard campaign in SF promoting their PDF parsing service for AI agents, and lists their booth appearances at upcoming tech conferences.
LlamaIndex demonstrates how to automate a loan underwriting pipeline using LlamaParse to extract structured data from financial PDFs, with cross-document analysis and human-in-the-loop review.
OpenDataLoader-PDF is an open-source PDF parsing tool that achieves a high accuracy rate of 0.907 in tests with real academic papers. It efficiently converts complex PDF documents (including tables, formulas, and scanned images) into Markdown and JSON, making it ideal for local knowledge bases and RAG applications.
OpenDataLoader is an open-source tool that converts PDFs into structured Markdown and JSON, supporting local processing speeds of up to 100 pages/second without requiring a GPU or incurring API costs, designed specifically for RAG pipelines and PDF accessibility automation.
Jerry Liu discusses challenges with using Vision Language Models for PDF parsing, particularly around ensuring text correctness and maintaining proper reading order while avoiding hallucinations.
PaddleOCR is a powerful, lightweight OCR toolkit that converts PDFs and images into structured data for AI applications, supporting 100+ languages and designed to bridge documents with LLMs.
LiteParse is a standalone open-source PDF parsing tool from run-llama that provides fast, local spatial text extraction with bounding boxes, supporting multiple programming languages and platforms.