@jerryjliu0: We made Claude better and faster at understanding PDFs The trick isn’t just creating the fastest free document parser o…

X AI KOLs Following Tools

Summary

LlamaIndex improved their LiteParse PDF parsing skill for Claude agents, making it 37% cheaper and more accurate by optimizing agent behavior through evaluation traces.

We made Claude better and faster at understanding PDFs The trick isn’t just creating the fastest free document parser out there (with liteparse), but also *tuning the skill itself* so that Claude Code can use it with fewer turns and expensive file operations. This is a fantastic blog post by @itsclelia which dives into the decision traces of Claude Code in how it operates over your filesystem, and identifies opportunities for optimization. We were able to incentivize the right skill behavior by doing the following: Preventing expensive mistakes like re-parsing the PDF for every search, leaving OCR on, reading screenshots when unnecessary, and preventing huge grep dumps Providing a simple BM-25 backed retrieval on parsed text Reducing the number of `grep` and `seq` sequential turns to reduce latency The net result is that we are 37% cheaper and higher accuracy than using Claude Code over raw PDFs. LiteParse is fully free and open-source, and you can plug in the skill today! Blog: https://llamaindex.ai/blog/building-a-better-liteparse-skill-with-evals?utm_medium=socials&utm_source=twitter&utm_campaign=2026--… Repo: https://github.com/run-llama/liteparse…
Original Article
View Cached Full Text

Cached at: 06/17/26, 03:56 PM

We made Claude better and faster at understanding PDFs

The trick isn’t just creating the fastest free document parser out there (with liteparse), but also tuning the skill itself so that Claude Code can use it with fewer turns and expensive file operations.

This is a fantastic blog post by @itsclelia which dives into the decision traces of Claude Code in how it operates over your filesystem, and identifies opportunities for optimization. We were able to incentivize the right skill behavior by doing the following: Preventing expensive mistakes like re-parsing the PDF for every search, leaving OCR on, reading screenshots when unnecessary, and preventing huge grep dumps Providing a simple BM-25 backed retrieval on parsed text Reducing the number of grep and seq sequential turns to reduce latency

The net result is that we are 37% cheaper and higher accuracy than using Claude Code over raw PDFs.

LiteParse is fully free and open-source, and you can plug in the skill today!

Blog: https://llamaindex.ai/blog/building-a-better-liteparse-skill-with-evals?utm_medium=socials&utm_source=twitter&utm_campaign=2026–…

Repo: https://github.com/run-llama/liteparse…


Building a Faster, Cheaper PDF-Parsing Skill for Claude Agents: A LiteParse Case Study

Source: https://www.llamaindex.ai/blog/building-a-better-liteparse-skill-with-evals?utm_medium=socials&utm_source=twitter&utm_campaign=2026– In this blog post, we go through how we improved our LiteParse skill for document parsing from into a cheaper, faster and higher-quality helper by evaluating the agent’s usage of it, analyzing traces, and iterating.

Setup

We benchmark Claude’s ability to answer questions over real corporate sustainability / ESG reports using thepdfQA-BenchmarkClimateFinanceBenchdataset, downloading 30 PDF files along with annotated question-answer pairs.

We then run a Claude Agent (via theclaude\-agent\-sdk) over 15\(PDF, question\)pair. Every run produces astructured JSON answerand afull JSONL interaction trace.

Each Claude Agent has access to standard tools, is limited to the project scope and is conditionally allowed to invoke a skill based on the evaluation configuration.

We compare several configurations:

  • raw— Claude reads PDFs directly with the built-inReadtool
  • liteparse— first cut of a skill wrapping the locallitCLIfor fast, model-free PDF parsing.
  • liteparse\-targeted— a more directive variant. We were trying to get Claude to notice/use LiteParse more often.
  • effective\-liteparse— a skill optimized for effective LiteParse usage to reduce latency, based on analyzing evaluation traces.

This post is about how the last one came to be.

Why a skill at all?

LiteParse can be used either as a command-line application or as a library for Rust, Python, and JavaScript/TypeScript. Because document parsing is inherently I/O-bound and requires direct access to files or raw bytes, LiteParse is not a natural fit for an MCP server, which does not support file uploads and would require either base64-encoded strings or other workarounds documented inthis other blog post.

A skill is therefore the most practical integration pattern. By packaging usage instructions into a markdown file that is injected into the agent’s context, we enable the agent to use the LiteParse CLI as a drop-in replacement for Claude’s built-in PDF reader or alternative parsing tools such as PyMuPDF and pdftotext. Using the CLI also makes it easy to compose LiteParse with standard Unix tools such asgrepandsed, allowing agents to filter, search, and transform parsed output without requiring additional tooling.

However, making a tool available is only part of the challenge. The skill instructions must be carefully designed and evaluated so that the agent not only invokes LiteParse when appropriate, but also uses it effectively. A well-crafted skill helps the agent process documents faster, reduce token and compute costs, and achieve higher extraction accuracy than generic parsing approaches.

On the hunt for anti-patterns

After the first two evaluation cycles, we collected the first metrics (latency, turns, costs, tool calls…) and analyzed the JSONL traces from the first skill versions, finding a cluster of recurring, expensive mistakes:

  • Re-parsing the same PDF over and over— in the worst trace,lit parseran9 timeson a single document, once per search. Each call re-extracts the entire PDF.
  • OCR left on for born-digital PDFs— most ESG reports have a real text layer; running OCR was pure wasted time.
  • Reading high-DPI page screenshots into context— a single page PNG cost**~140k characters**of context, and agents often rendered the same page twice (default + hi-res).
  • Unbounded, shotgun greps— huge keyword alternations dumping 15–25k characters into the conversation.

Despite these anti-patterns, the LiteParse approach showed significant potential. Because parsing is performed externally through the CLI, it is not constrained by the limits of Claude’s native document reader, which currently accepts PDFs of at most 32 MB and 600 pages. In practice, this gives LiteParse effectively unbounded parsing capacity, limited primarily by the available system resources.

That’s why we decided to create theeffective\-liteparseskill, encoding the fixes as hard rules: parse once to a temp file, then search the file;\-\-no\-ocrfor born-digital PDFs; screenshots only as a last resort, one page, modest DPI; keep results small.

Besides hardened rules, we noticed that the Claude Agent would often perform several tight iterations withgrep,sedandReadto find the right context within the parsed content to complete the evaluation task. In this sense, we expanded the surface of the skill by including a small, self-contained python script that concurrently reads, chunks, indexes and performs BM25-backed retrieval based on a provided. We included, within the skill, the directive to use this script to search more ambiguous keywords, defaulting to lexical search for pattern/substring search.

Parse, but don’t wander off

After the first fixes, a new signal stood out while analyzing the traces:effective\-liteparsewas cheaper but slower than raw Claude Code, and the cause wasthe number**turns, not parsing per-se. The most called tools after parsing weregrepandsed, used in aserial loop: grep → look → refine grep → aseparatesedturn to read the window → grep again, with each turn being a full API roundtrip.

Ironically, two of our own earlier rules made this worse:*“locate first, then read the window with**sedsplit every lookup into two turns, **and“don’t shotgun”*nudged the model toward many tiny serial greps. So we changed the guidance tominimize round-trips:

  1. Get context in the same commandgrep \-n \-i \-C4 "term"returns the hitandits window in one turn, removing the follow-upsed.
  2. Batch independent lookups into one command— a labeledforloop probes several facts at once instead of one grep per turn.
  3. A hard search budget— resolve in ≤3 commands; after two unsuccessful greps, fall back to the BM25 ranker once instead of firing keyword variants forever.

The traces confirmed adoption: a newfor-loop batching pattern appeared in the post-parse tool mix, and average turns dropped by approx. 15% (13.1 → 11.1).

The numbers

We score answers with anLLM-as-judgepanel (Gemini and GPT, each rating theanswerand thereasoningbehind it), and measure efficiency with trace analysis, on a matched 15-question subset.

Quality (LLM-as-judge, avg score, higher is better):

Metricraweffective-liteparseOverall answer46.4756.67Overall reasoning58.4765.90Gemini answer58.5378.33Gemini reasoning71.3386.00GPT answer34.4035.00GPT reasoning45.6045.80 Efficiency (trace metrics):

Metricraweffective-liteparseAvg cost / question0\.751**0.474**(−37%)p95 cost1\.323**0.746**Avg turns8.4711.08Avg turn duration6.5 s5.6 s (−14%) Token usage:

Metric (avg per run)raweffective-liteparsebase input tokens2317cache write (5m) tokens86,66629,623cache read tokens214,924354,330output tokens2,4333,546total input tokens (all)301,612383,970cost — cache write (5m)0\.5420.185cost — cache read0\.1070.177cost — output0\.0610.089avg cost (reported)****0\.751****0.452 The skill is37% cheaper per question and scores higher on every judge metric. It is still a few seconds slower on the full task, while remaining faster in turn duration, allowing for more iterations.

A note on token usage

At first glance theeffective\-liteparsevariant looksmoreexpensive on tokens: it processes ~384k input tokens per run on average against ~302k for the raw baseline. But that headline number is misleading, because it lumps together input categories that are billed very differently. Once the input is broken out by billing category, the picture inverts: the extra volume inliteparseis overwhelminglycache reads(tokens billed at the discounted 0\.50/MTok rate\) while the expensive part, fresh content written into the cache \(6.25/MTok), is roughly3× lowerthan the baseline (29.6k vs 86.7k tokens). The raw approach re-caches large PDF image pages on every read, whereasliteparseparses documents locally and feeds back compact text, so the costly cache writes shrink dramatically even as cheap cache reads grow. The net effect is thatliteparseruns at about40% lower cost($0.45 vs $0.75 avg) despite touching more tokens overall.

Costs use the published Claude Opus 4.7 rates ($5 base input / $6.25 5m-cache-write / $10 1h-cache-write / $0.50 cache-read / $25 output, per MTok). The reported cost is the runtime’s*total\_cost\_usd; the small gap versus the per-category derived total comes from a secondary Haiku call that the top-level usage record omits.*

Takeaways

  • **Traces are the ground truth.**Every improvement here came from reading what the agent actually did.
  • Skill guidance has second-order effects.“Locate then read” and “don’t shotgun” sounded prudent but eachaddedround-trips. Optimize for total turns, not per-command tidiness.
  • **Separate the harness from the skill.**The biggest cost number was a harness artifact, not a skill property. Measure carefully before attributing.
  • **Cheaper and better aren’t a trade-off here.**Disciplined, local parsing beat raw PDF ingestion on both cost and answer quality.

You can find the full benchmark and reproduce it at:https://github.com/run-llama/benchmark-claude-pdfs

LlamaIndex 🦙 (@llama_index): How much can good documentation save an AI agent in cost and time? Turns out, a lot.

We built a custom skill that teaches Claude how to parse PDFs more efficiently, then used real usage traces to find where it was wasting time and money (re-reading the same file over and over,

Similar Articles