@VikParuchuri: Datalab balanced mode extraction now scores 95.9% in our internal benchmark - more accurate than Reducto Deep Extract (…
Summary
Datalab's balanced mode extraction achieves 95.9% accuracy in internal benchmarks, surpassing Reducto Deep Extract (95.1%) at less than half the price, with full verification including citations and reasoning.
View Cached Full Text
Cached at: 06/27/26, 09:59 PM
Datalab balanced mode extraction now scores 95.9% in our internal benchmark - more accurate than Reducto Deep Extract (95.1%), at less than half the price.
We include full verification, with citations and reasoning, so you know exactly which values to manually inspect. https://t.co/UNUBkB23Ll
Similar Articles
@VikParuchuri: We're open sourcing a 9B model that extracts structured data from documents at near-frontier performance. - 90.2% on ou…
Vik Paruchuri is open-sourcing a 9B model that extracts structured data from documents with near-frontier performance (90.2% on their benchmark, vs Gemini 3.5 Flash at 91.3%).
@VikParuchuri: We're launching turbo mode data extraction - 5x faster, 5x cheaper, and 7% more accurate than Azure Content Understandi…
VikParuchuri announces the launch of turbo mode data extraction, claiming 5x faster and cheaper performance with 7% more accuracy than Azure Content Understanding, achieving competitive latency for real-time workflows.
We benchmarked 18 LLMs on OCR (7k+ calls) — cheaper/old models oftentimes win. Full dataset + framework open-sourced. [R]
A comprehensive benchmark of 18 LLMs on OCR tasks (7k+ calls) reveals that cheaper and older models often match premium accuracy at a fraction of the cost, with full dataset and framework open-sourced.
@lu__jasper: Some early results from playing around with search on a subsampled version of OBLIQ-bench. Mixedbread's reranker is a b…
Early results from testing search on a subsampled OBLIQ-bench show that Mixedbread's reranker achieves strong MRR, sometimes outperforming GPT 5.5 on certain metrics with faster speed, though the benchmark remains challenging.
@sheriyuo: Best-of-N, rejection sampling, and rubric-based ranking all assume you already have a reliable way to evaluate candidat…
Apodex releases Apodex-1.0, a deep-research model that uses a heavy-duty agent team with global verification, achieving state-of-the-art results on multiple benchmarks including BrowseComp, DeepSearchQA, and HLE.