ocr

Tag

Cards List
#ocr

@berryxia: Guys, my back isn’t chilling. But, I’m thrilled after seeing this model architecture! While everyone is still frantically stacking parameters and competing with general-purpose large models, Interfaze has introduced a brand-new hybrid architecture. It achieves OCR, vision, STT, and structured output accuracy for deterministic tasks that crushes Gemini-3-Flash…

X AI KOLs Timeline · yesterday Cached

Interfaze introduces a new hybrid AI model architecture that combines DNN/CNN encoders with transformers to achieve superior accuracy and cost-efficiency for deterministic tasks such as OCR, vision, and STT, compared to generalist models.

0 favorites 0 likes
#ocr

We tested super-resolution pre-filter for LPR OCR. It did nothing

Hacker News Top · yesterday Cached

Wink Engineering evaluates the efficacy of neural super-resolution as a pre-filter for license plate OCR, concluding that it fails to improve accuracy and often leads to hallucinated characters compared to training directly on low-resolution data.

0 favorites 0 likes
#ocr

@jerryjliu0: LiteParse is the best open-source, model-free document parser for AI agents. Run it over over 50+ document types, and i…

X AI KOLs Following · yesterday Cached

LlamaIndex releases liteparse-server, a self-hosted, model-free HTTP API for parsing diverse document types with high spatial fidelity and privacy preservation.

1 favorites 1 likes
#ocr

@itsclelia: Do you actually own your document parsing infrastructure? At @llama_index, we wanted to make that easier, so we built �…

X AI KOLs Following · yesterday Cached

LlamaIndex introduces liteparse-server, an open-source, self-hosted HTTP backend for parsing PDFs, images, and Office documents with spatial layout extraction, OCR, and screenshot generation, designed for AI and data workflows.

0 favorites 0 likes
#ocr

@oliviscusAI: You can now parse any document with one 1.7B parameter model It’s called dots-ocr. One system that handles text, tables…

X AI KOLs Timeline · yesterday Cached

The article introduces dots-ocr, a 1.7B parameter model capable of parsing text, tables, formulas, and images from documents in over 100 languages without needing separate OCR pipelines.

0 favorites 0 likes
#ocr

@aaron_epstein: New model just released that beats sonnet 4.6, gemini 3 flash, and gpt 5.4 mini on OCR, vision, and STT tasks @interfaz…

X AI KOLs Following · 2d ago

A new AI model from interfaze_ai claims to outperform leading models (sonnet 4.6, gemini 3 flash, gpt 5.4 mini) on OCR, vision, and speech-to-text tasks.

0 favorites 0 likes
#ocr

We benchmarked 18 LLMs on OCR (7k+ calls) — cheaper/old models oftentimes win. Full dataset + framework open-sourced. [R]

Reddit r/MachineLearning · 2026-04-23

A comprehensive benchmark of 18 LLMs on OCR tasks (7k+ calls) reveals that cheaper and older models often match premium accuracy at a fraction of the cost, with full dataset and framework open-sourced.

0 favorites 0 likes
#ocr

Local manga translator with LLM build-in, written in Rust with llama.cpp integration

Reddit r/LocalLLaMA · 2026-04-22

Koharu is an open-source Rust-based manga/image translator that combines object detection, visual LLM OCR, layout analysis, and inpainting, with llama.cpp integration supporting Gemma 4 and Qwen3.5 models.

0 favorites 0 likes
#ocr

Gemma 4 Vision

Reddit r/LocalLLaMA · 2026-04-21

Gemma 4’s vision performance is bottlenecked by low default token budgets; raising --image-max-tokens to 2240 in llama.cpp unlocks state-of-the-art OCR and detail recognition at the cost of ~14 GB extra VRAM.

0 favorites 0 likes
#ocr

@ycombinator: LLMs are great for human in the loop applications, but fail at deterministic developer tasks. @interfaze_ai is a new AI…

X AI KOLs Following · 2026-04-20 Cached

Interfaze AI introduces a specialized model that surpasses general LLMs on deterministic developer tasks including OCR, object detection, web scraping, speech-to-text, and classification.

0 favorites 0 likes
#ocr

@techNmak: A lightweight VLM that beats the giants at OCR. (1.7B parameters, SOTA on OmniDocBench) dots. ocr is a new multilingual…

X AI KOLs Timeline · 2026-04-20 Cached

dots.ocr is a new lightweight 1.7B parameter multilingual vision-language model that achieves state-of-the-art performance on OmniDocBench, outperforming much larger models (72B+) at document parsing and OCR tasks.

0 favorites 0 likes
#ocr

SGOCR: A Spatially-Grounded OCR-focused Pipeline & V1 Dataset [P]

Reddit r/MachineLearning · 2026-04-20

SGOCR is an open-source dataset pipeline for generating spatially-grounded, OCR-focused visual question answering (VQA) tuples with rich metadata to support diverse VLM training. The pipeline uses a multi-stage approach combining models like Nvidia's nemotron-ocr-v2, Gemma4, Qwen3-VL, and Gemini-2.5-Flash, along with an agentic optimization loop.

0 favorites 0 likes
#ocr

Building a Fast Multilingual OCR Model with Synthetic Data

Hugging Face Blog · 2026-04-17 Cached

NVIDIA introduces Nemotron OCR v2, a fast multilingual OCR model built using synthetic data generation. The model achieves 34.7 pages/second on a single A100 GPU by using a unified FOTS-based architecture with feature reuse across detection, recognition, and relational components.

0 favorites 0 likes
#ocr

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Papers with Code Trending · 2025-09-26 Cached

MinerU2.5 is a 1.2B-parameter vision-language model that achieves state-of-the-art document parsing accuracy with high computational efficiency using a coarse-to-fine parsing strategy.

0 favorites 0 likes
#ocr

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Papers with Code Trending · 2025-03-14 Cached

SmolDocling is a compact 256M parameter vision-language model designed for end-to-end multi-modal document conversion. It introduces a new universal markup format called DocTags to capture page elements with location, competing with models 27 times larger.

0 favorites 0 likes
#ocr

paperless-ngx/paperless-ngx

GitHub Trending (daily) · 2026-04-20 Cached

Paperless-ngx is an open-source document management system that digitizes and archives physical documents with full-text search capabilities. It is the official successor to the original Paperless and Paperless-ng projects, designed as a community-driven initiative.

0 favorites 0 likes
← Back to home

Submit Feedback