@jerryjliu0: Last week we revamped Liteparse to be the fastest PDF parser out there An underrated part of liteparse is it doesn't ju…

X AI KOLs Following 06/01/26, 08:51 PM Tools

pdf-parser llama-index bounding-boxes open-source document-parser coding-agent

Summary

Jerry Liu announces a revamped LiteParse, a fast PDF parser that provides bounding boxes for audit trails, with sample demos available.

Last week we revamped Liteparse to be the fastest PDF parser out there An underrated part of liteparse is it doesn't just give you text. It gives you bounding boxes that a coding agent can use to paint exact audit trails back to the source document. For instance, check out the deep research skill we compiled in liteparse_samples: https://github.com/jerryjliu/liteparse_samples… Come check out liteparse: https://github.com/run-llama/liteparse… We are hard at work making liteparse even better (e.g. Markdown support). Please feel free to open up issues, PRs, and let us know your feature requests

Original Article

View Cached Full Text

Cached at: 06/01/26, 09:36 PM

Last week we revamped Liteparse to be the fastest PDF parser out there An underrated part of liteparse is it doesn’t just give you text. It gives you bounding boxes that a coding agent can use to paint exact audit trails back to the source document. For instance, check out the deep research skill we compiled in liteparse_samples: https://github.com/jerryjliu/liteparse_samples… Come check out liteparse: https://github.com/run-llama/liteparse… We are hard at work making liteparse even better (e.g. Markdown support). Please feel free to open up issues, PRs, and let us know your feature requests

jerryjliu/liteparse_samples

Source: https://github.com/jerryjliu/liteparse_samples

LiteParse Samples

Interactive demos showcasing LiteParse — a fast, local, model-free document parser by LlamaIndex.

Samples

Parser Comparison

Side-by-side comparison of LiteParse vs PyPDF vs PyMuPDF on real government and financial documents. See the original PDF on the left, then tab through each parser’s extracted text on the right.

Parser Comparison

Quick start: Open comparison/output/comparison.html in your browser.

Features:

8 document sections from 5 real-world PDFs (FDIC, Federal Reserve, CMS, IRS, WHO)
Embedded PDF viewer alongside parsed text
Per-document timing for each parser

Visual Citations

Exact keyword search over parsed documents — see precisely where each match appears on the source PDF page, with bounding boxes highlighted directly on the page image. This is a simple substring match demo (not fuzzy or RAG-based search). Learn more in the Visual Citations guide.

Visual Citations

Quick start: Open visual_citations/output/visual-citations.html in your browser.

Features:

Interactive keyword search across all documents
Bounding box overlays on rendered page images
Side-by-side view of source page and parsed text with highlighted matches

Research Docs (Claude Code Skill)

Ask questions about your documents — get answers with visual source citations. Install as a Claude Code skill and invoke with /research-docs. The skill parses your documents, has Claude answer your question, and generates an HTML report with the answer and cited source pages highlighted with bounding boxes.

Research Docs

Install:

npx skills add run-llama/liteparse_samples --skill research_docs

Usage: /research-docs ./my-pdfs What is the total revenue?

Features:

Parse any document LiteParse supports (PDF, DOCX, PPTX, XLSX, images) plus plaintext
AI-powered answers with exact-quote source citations
Bounding box highlights on source page images
PDF viewer toggle for each citation
Self-contained HTML report

Regenerating with Your Own Data

Add your PDFs to the data/ folder
Edit docs.json in the relevant sample folder to configure your documents and pages
Install dependencies and run:

pip install -r requirements.txt

# Regenerate comparison
cd comparison && python generate.py

# Regenerate visual citations
cd visual_citations && python generate.py

# Install research_docs skill
cp -r research_docs ~/.claude/skills/research-docs
# Then use: /research-docs ./data "Your question here"

docs.json format

Each sample has a docs.json that controls which documents and pages are processed:

[
  {
    "name": "My Document Title",
    "file": "my_document.pdf",
    "pages": [0, 1, 2],
    "source": "example.com",
    "desc": "Optional description (comparison only)"
  }
]

file: PDF filename (must exist in data/)
pages: 0-indexed page numbers to parse
source: Attribution label
desc: Description shown in comparison cards (comparison sample only)

Data

The included PDFs are publicly available government documents:

File	Source	Description
`cms_pfs.pdf`	cms.gov	CMS Medicare Physician Fee Schedule (CY 2026)
`fdic_qbp.pdf`	fdic.gov	FDIC Quarterly Banking Profile
`fed_h41.pdf`	federalreserve.gov	Federal Reserve H.4.1 Statistical Release
`irs_1040.pdf`	irs.gov	IRS Form 1040 — U.S. Individual Income Tax Return
`who_indicators.pdf`	who.int	WHO Core Health Indicators

Requirements

Python 3.9+
Dependencies: liteparse, pypdf, pymupdf (see requirements.txt)

pip install -r requirements.txt

Links

Jerry Liu (@jerryjliu0): We’ve created the world’s fastest PDF parser ⚡️

And it’s more accurate than any other open-source, model-free PDF parser out there (pymupdf, pypdf, markitdown, pdftotext, opendataloader, pymupdf4llm)

Introducing LiteParse v2 - we rewrote the entire library into Rust and

@jerryjliu0: Last week we revamped Liteparse to be the fastest PDF parser out there An underrated part of liteparse is it doesn't ju…

jerryjliu/liteparse_samples

LiteParse Samples

Samples

Parser Comparison

Visual Citations

Research Docs (Claude Code Skill)

Regenerating with Your Own Data

docs.json format

Data

Requirements

Links

Similar Articles

@jerryjliu0: Parse PDFs at lightspeed (this video is at 1x) Absolute cinema

@jerryjliu0: LiteParse, our OSS document parser, is really good at parsing complex PDF layouts, text, and tables into a clean spatia…

@jerryjliu0: LiteParse is the best open-source, model-free document parser for AI agents. Run it over over 50+ document types, and i…

run-llama/liteparse

@llama_index: When we say “LiteParse runs everywhere,” we mean it. Our WASM package is lightweight, minimal, and built for browser an…

Submit Feedback

Similar Articles

@jerryjliu0: Parse PDFs at lightspeed (this video is at 1x) Absolute cinema

@jerryjliu0: LiteParse, our OSS document parser, is really good at parsing complex PDF layouts, text, and tables into a clean spatia…

@jerryjliu0: LiteParse is the best open-source, model-free document parser for AI agents. Run it over over 50+ document types, and i…

@llama_index: When we say “LiteParse runs everywhere,” we mean it. Our WASM package is lightweight, minimal, and built for browser an…