document-processing

Tag

Cards List
#document-processing

Docling vs Liteparse vs Mineru vs Unstructured for on-prem document processing for a university

Reddit r/LocalLLaMA · yesterday

A comparison of on-prem document processing tools—Docling, Liteparse, Mineru, and Unstructured—for university use, evaluating their suitability for local deployment.

0 favorites 0 likes
#document-processing

@ErickSky: Baidu has just broken one of the biggest limitations of current OCR. Unlimited-OCR processes entire documents in a sing…

X AI KOLs Timeline · 2d ago Cached

Baidu has released Unlimited-OCR, which processes entire documents in a single pass without chunking, overcoming a major limitation of current OCR technology.

0 favorites 0 likes
#document-processing

@VikParuchuri: We're open sourcing a 9B model that extracts structured data from documents at near-frontier performance. - 90.2% on ou…

X AI KOLs Following · 5d ago Cached

Vik Paruchuri is open-sourcing a 9B model that extracts structured data from documents with near-frontier performance (90.2% on their benchmark, vs Gemini 3.5 Flash at 91.3%).

0 favorites 0 likes
#document-processing

@DataChaz: Messy documents in. Complex knowledge graphs out. One command line. If your pipeline simply compiles data into generic …

X AI KOLs Timeline · 2026-06-17 Cached

Hyper-Extract is an open-source framework that converts messy documents into typed knowledge structures, supporting multiple graph architectures like GraphRAG, LightRAG, and KG-Gen, with 10+ extraction engines and 80+ YAML templates for various domains.

0 favorites 0 likes
#document-processing

Typst 0.15 contains multitudes

Lobsters Hottest · 2026-06-15 Cached

Typst 0.15, a major release of the open-source typesetting system, introduces support for variable fonts, MathML export, multi-file output, multiple bibliographies, and multiple PDF standards, along with improved documentation and diagnostics.

0 favorites 0 likes
#document-processing

@TeksEdge: Need to OCR documents? PP-OCRv6 dropped — currently the best open-source OCR models you can download ◆︎ Fully Open Sour…

X AI KOLs Timeline · 2026-06-12 Cached

PP-OCRv6 is a new open-source OCR model series from Baidu's PaddleOCR, available in Tiny/Small/Medium sizes with excellent accuracy and speed, beating several commercial models.

0 favorites 0 likes
#document-processing

@DailyDoseOfDS_: Fine-tune DeepSeek-OCR on your own language! (100% local) Most vision models treat documents as massive sequences of to…

X AI KOLs Timeline · 2026-06-08 Cached

DeepSeek-OCR is a 3B vision model using context optical compression for efficient document processing. Fine-tuning it on Persian text using Unsloth achieved an 88.26% improvement in character error rate, all open-source and runnable on a single GPU.

0 favorites 0 likes
#document-processing

I made a small local model (llama3.2 3B) reliably extract structured JSON from documents - the hard part wasn't the model, it was everything around it

Reddit r/AI_Agents · 2026-06-05

A developer shares lessons from building a local document-to-JSON extractor using llama3.2 3B on Ollama, highlighting that deterministic post-processing and schema-constrained outputs matter more than model size, while seeking feedback on hallucination and context truncation issues with long documents.

0 favorites 0 likes
#document-processing

@0xSilver_Time: After using NotebookLM for 6 months, I can say it has revolutionized my workflow to the greatest extent. But that's only after I mastered these 10 prompts. This system can transform 200 pages of documents into clear answers in 1 hour.

X AI KOLs Timeline · 2026-06-03 Cached

The user shares their 6-month experience with NotebookLM and provides 10 prompts, claiming to convert 200 pages of documents into clear answers in 1 hour.

0 favorites 0 likes
#document-processing

@IndieDevHailey: MarkItDown — The Document Hell Terminator, Instantly Turns Any File into LLM-Perfect Markdown! Microsoft Open-Sources MarkItDown, 138k+ Stars Topping Trending, Goodbye to PDF Garbled Text, Word Table Explosions, P...

X AI KOLs Timeline · 2026-06-02 Cached

Microsoft has open-sourced MarkItDown, a tool that can convert PDF, Word, Excel, PPT and other files into well-structured Markdown format with a single click, making it easy to feed directly into LLMs. It has garnered over 138k stars on GitHub.

0 favorites 0 likes
#document-processing

@itsclelia: Had a blast yesterday attending at @techeurope_'s Applied AI Conference in Berlin! I had a talk about building document…

X AI KOLs Following · 2026-05-29 Cached

Attended the Applied AI Conference in Berlin and gave a talk on building document agents, including a detailed walkthrough of LobsterX, a document-processing agent built with LlamaIndex that uses structured outputs and event-driven workflows.

0 favorites 0 likes
#document-processing

Building a Scalable Ingestion Pipeline with Temporal (Part 1)

Lobsters Hottest · 2026-05-26 Cached

This blog post describes the architecture for a scalable ingestion pipeline using Temporal to handle crawling, extracting, chunking, and embedding customer documentation from various sources, emphasizing durability, statefulness, and concurrency control.

0 favorites 0 likes
#document-processing

@AYi_AInotes: https://x.com/AYi_AInotes/status/2058536443174158504

X AI KOLs Timeline · 2026-05-24 Cached

The author shares their three-year experience of feeding PDFs to AI, pointing out that Markdown is a better input format for AI than PDF, because PDF is essentially a mix of coordinates and characters. AI needs to parse the structure first, which is error-prone and consumes more tokens. The article provides specific cases and recommended tools (markitdown, pandoc, LlamaParse), and teases a new series called 'The Art of Feeding AI'.

0 favorites 0 likes
#document-processing

@huang_chao4969: LightRAG v1.5 is here! The biggest release ever! 35k+ GitHub | 1.1M+ downloads | 251 contributors | 1.1k+ PRs merged He…

X AI KOLs Timeline · 2026-05-23 Cached

LightRAG v1.5 is released with six major improvements including multimodal document processing, enhanced parsing, and role-specific LLM configuration, making RAG simpler, faster, and more powerful.

0 favorites 0 likes
#document-processing

@jerryjliu0: We pride ourselves on building document processing that is not only accurate and cheap, but massively scalable to milli…

X AI KOLs Following · 2026-05-23 Cached

LlamaParse now offers latency metrics for Parse, Extract, and Classify jobs, providing queue time, processing time, and total latency breakdowns. This helps users monitor and scale their document processing.

0 favorites 0 likes
#document-processing

Parsewise API

Product Hunt · 2026-05-22

Parsewise is an API for agentic multi-document processing, enabling efficient handling of multiple documents.

0 favorites 0 likes
#document-processing

@knowledgefxg: Practical Open-Source Tool Recommendation: pdf-inspector solves a very real problem: not all PDFs need OCR. For example, you throw a PDF at it, and it first determines what type of PDF it is—whether it's a normal text-based version (e.g., exported from Word) or a scanned version (image)…

X AI KOLs Timeline · 2026-05-22 Cached

pdf-inspector is an open-source Rust library for intelligently classifying PDF types (text or scanned), extracting text, and converting to Markdown, avoiding unnecessary OCR to improve speed and save costs.

0 favorites 0 likes
#document-processing

MADP: A Multi-Agent Pipeline for Sustainable Document Processing with Human-in-the-Loop

arXiv cs.AI · 2026-05-19 Cached

MADP is a multi-agent architecture for enterprise document processing that combines deep learning and LLMs with human-in-the-loop validation, achieving 97% automation and significant reductions in resource usage.

0 favorites 0 likes
#document-processing

@jerryjliu0: Agents + file sandboxes are all in the range in 2026 This is a nifty reference implementation by @itsclelia showing you…

X AI KOLs Following · 2026-05-11 Cached

This reference implementation demonstrates how to run an LLM agent securely within a local sandbox to process and analyze various document types using Rust, LiteParse, and microsandbox. The open-source CLI leverages OpenAI's GPT models and native bash commands to perform file retrieval and analysis in an isolated environment.

0 favorites 0 likes
#document-processing

@tom_doerr: Converts research papers into editable diagrams and slides https://github.com/OpenDCAI/Paper2Any…

X AI KOLs Timeline · 2026-05-10 Cached

Paper2Any is an open-source AI tool that converts research papers into editable diagrams, technical roadmaps, and slide decks with support for universal file formats and custom styling.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback