Tag
LFRAG proposes a layout-oriented fine-grained retrieval-augmented generation framework that moves from page-level to block-level retrieval in multimodal documents, achieving state-of-the-art performance and 73% token reduction on the new LFDocQA benchmark.
Nasdaq features an interview with Llama Index CEO Jerry Liu, discussing the company's document understanding and OCR technologies powering enterprise AI agents, in partnership with Wing VC's Enterprise Tech 30 list.
LlamaIndex released ParseBench, a comprehensive benchmark for evaluating document understanding in AI agents, covering complex enterprise documents with tables, charts, and layouts. A live webinar will discuss the benchmark methodology and results.
Infinity releases two open-weight models, Infinity-Parser2-Pro (35B) and Infinity-Parser2-Flash (2B), which top the ParseBench leaderboard for document understanding, leveraging a synthetic data engine and a novel joint RL algorithm.
CiteVQA is a benchmark for document vision-language models that evaluates both answer correctness and citation of supporting evidence, revealing widespread attribution hallucinations where models provide correct answers but cite wrong regions.
DocScope is a new benchmark for evaluating the verifiable reasoning and trustworthiness of Multimodal Large Language Models on long documents, introducing a four-stage evaluation protocol for page localization, region grounding, fact extraction, and answer verification.
DocAtlas is a framework that creates high-fidelity OCR datasets and benchmarks across 82 languages, using differential rendering and synthetic generation. It demonstrates that Direct Preference Optimization improves multilingual model adaptation without degrading base-language performance.
NuExtract3 is a 4B vision-language reasoning model for document understanding, enabling structured extraction and image-to-Markdown conversion.
ParseBench introduces the first benchmark evaluating vision-language models on chart comprehension within full enterprise documents, addressing gaps in prior chart-only benchmarks.
dots.ocr is a new lightweight 1.7B parameter multilingual vision-language model that achieves state-of-the-art performance on OmniDocBench, outperforming much larger models (72B+) at document parsing and OCR tasks.
IBM releases Granite 4.0 3B Vision, a compact vision-language model designed for enterprise document understanding, featuring specialized capabilities for table extraction, chart interpretation via ChartNet, and key-value pair grounding.