Tag
A developer shares their experience using a local Qwen VL model on an RTX 3060 to parse Japanese receipts into JSON, replacing Google Vision, with results showing accurate extraction of key fields at ~31 seconds per receipt.
Hyper-Extract is a CLI tool that transforms messy, unstructured documents into structured knowledge such as knowledge graphs, hypergraphs, temporal/spatial graphs, and Obsidian vaults, supporting local LLM inference and MCP integration.
ExtractConf is a confidence estimation method for LLM-based document field extraction that uses two structurally different calls (field-guided and document-guided) to derive disagreement signals, achieving 0.928 ROC AUC on DocILE invoices and enabling reliable selective prediction for high-stakes automation.
Agentic Document Extraction is a tool that uses AI agents to make documents computable by extracting structured data from unstructured documents.
LlamaIndex introduces an Extract feature in LlamaParse for turning unstructured contract data into structured, machine-readable metadata using layout-aware parsing and LLMs, addressing challenges like non-standard templates and cross-references.
docext is an on-premises toolkit that converts images and PDFs to markdown without OCR, leveraging vision-language models. It also introduces Nanonets-OCR-s, a compact 3B parameter model for efficient image-to-markdown conversion.
olmOCR is an open-source toolkit using a fine-tuned vision language model to extract clean text from PDFs while preserving structure, optimized for large-scale batch processing.