Tag
LiteParse is an open-source, heuristic-based PDF parser that quickly converts complex layouts, text, and tables into a clean spatial grid without relying on ML models.
Researchers from Banting Health AI present an AI system using generative LLMs with Retrieval-Augmented Generation (RAG) for automated clinical trial protocol information extraction, achieving 89% accuracy compared to 62.6% for standalone LLMs, with AI-assisted workflows completing tasks 40% faster and reducing cognitive demand.
Jerry Liu discusses challenges with using Vision Language Models for PDF parsing, particularly around ensuring text correctness and maintaining proper reading order while avoiding hallucinations.
OpenAI shares how it built an internal contract data agent that automates the extraction and structuring of contract data from various document formats while keeping finance experts in control through a human-in-the-loop review process. The system has reduced contract review time by half and enabled the team to process thousands of contracts monthly without proportional headcount expansion.
SmolDocling is a compact 256M parameter vision-language model designed for end-to-end multi-modal document conversion. It introduces a new universal markup format called DocTags to capture page elements with location, competing with models 27 times larger.