document-understanding

#document-understanding

LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding

arXiv cs.AI ↗ · 2026-05-25 Cached

LFRAG proposes a layout-oriented fine-grained retrieval-augmented generation framework that moves from page-level to block-level retrieval in multimodal documents, achieving state-of-the-art performance and 73% token reduction on the new LFDocQA benchmark.

0 favorites 0 likes

#document-understanding

@NasdaqExchange: “We’re focused on providing the best in class document understanding and OCR technologies.” In partnership with @Wing_V…

X AI KOLs Following ↗ · 2026-05-24 Cached

Nasdaq features an interview with Llama Index CEO Jerry Liu, discussing the company's document understanding and OCR technologies powering enterprise AI agents, in partnership with Wing VC's Enterprise Tech 30 list.

0 favorites 0 likes

#document-understanding

@jerryjliu0: There are a lot of coding and reasoning benchmarks for AI agents, but not a lot for document understanding - which is a…

X AI KOLs Following ↗ · 2026-05-18 Cached

LlamaIndex released ParseBench, a comprehensive benchmark for evaluating document understanding in AI agents, covering complex enterprise documents with tables, charts, and layouts. A live webinar will discuss the benchmark methodology and results.

0 favorites 0 likes

#document-understanding

@jerryjliu0: A new set of open-weight models is topping the leaderboard for document understanding INF just released two models: Inf…

X AI KOLs Following ↗ · 2026-05-15 Cached

Infinity releases two open-weight models, Infinity-Parser2-Pro (35B) and Infinity-Parser2-Flash (2B), which top the ParseBench leaderboard for document understanding, leveraging a synthetic data engine and a novel joint RL algorithm.

0 favorites 0 likes

#document-understanding

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

Hugging Face Daily Papers ↗ · 2026-05-13 Cached

CiteVQA is a benchmark for document vision-language models that evaluates both answer correctness and citation of supporting evidence, revealing widespread attribution hallucinations where models provide correct answers but cite wrong regions.

0 favorites 0 likes

#document-understanding

DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document Understanding

arXiv cs.CL ↗ · 2026-05-12 Cached

DocScope is a new benchmark for evaluating the verifiable reasoning and trustworthiness of Multimodal Large Language Models on long documents, introducing a four-stage evaluation protocol for page localization, region grounding, fact extraction, and answer verification.

0 favorites 0 likes

#document-understanding

DocAtlas: Multilingual Document Understanding Across 80+ Languages

Hugging Face Daily Papers ↗ · 2026-05-12 Cached

DocAtlas is a framework that creates high-fidelity OCR datasets and benchmarks across 82 languages, using differential rendering and synthetic generation. It demonstrates that Direct Preference Optimization improves multilingual model adaptation without degrading base-language performance.

0 favorites 0 likes

#document-understanding

numind/NuExtract3

Hugging Face Models Trending ↗ · 2026-04-29 Cached

NuExtract3 is a 4B vision-language reasoning model for document understanding, enabling structured extraction and image-to-Markdown conversion.

0 favorites 0 likes

#document-understanding

@jerryjliu0: ParseBench is the first benchmark to include VLM chart understanding over enterprise documents. Existing benchmarks (Ch…

X AI KOLs Timeline ↗ · 2026-04-21 Cached

ParseBench introduces the first benchmark evaluating vision-language models on chart comprehension within full enterprise documents, addressing gaps in prior chart-only benchmarks.

0 favorites 0 likes

#document-understanding

@techNmak: A lightweight VLM that beats the giants at OCR. (1.7B parameters, SOTA on OmniDocBench) dots. ocr is a new multilingual…

X AI KOLs Timeline ↗ · 2026-04-20 Cached

dots.ocr is a new lightweight 1.7B parameter multilingual vision-language model that achieves state-of-the-art performance on OmniDocBench, outperforming much larger models (72B+) at document parsing and OCR tasks.

0 favorites 0 likes

#document-understanding

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

Hugging Face Blog ↗ · 2026-03-31 Cached

IBM releases Granite 4.0 3B Vision, a compact vision-language model designed for enterprise document understanding, featuring specialized capabilities for table extraction, chart interpretation via ChartNet, and key-value pair grounding.

0 favorites 0 likes

document-understanding

Submit Feedback