visual-language-model

#visual-language-model

@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…

X AI KOLs Timeline ↗ · 17h ago Cached

Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).

0 favorites 0 likes

#visual-language-model

Unification of Closed-Open Industrial Detection Scenarios: New Large-Scale Benchmarks,Challenges and Baselines

arXiv cs.AI ↗ · 2026-06-09 Cached

Introduces MMIOC-1M, a large-scale multi-modal benchmark for industrial defect detection, and proposes RTVPNet, a refined text-visual prompt network achieving state-of-the-art performance.

0 favorites 0 likes

#visual-language-model

@paulabartabajo_: Advice for AI engineers A small Visual Language Model fine-tuned on your custom dataset is as accurate as GPT-5... ... …

X AI KOLs Timeline ↗ · 2026-04-22 Cached

A tweet claims that a small visual language model fine-tuned on custom data can match GPT-5 accuracy while costing 50× less, citing Liquid AI’s 1.6B model running locally with llama.cpp.

0 favorites 0 likes

visual-language-model

@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…

Unification of Closed-Open Industrial Detection Scenarios: New Large-Scale Benchmarks,Challenges and Baselines

@paulabartabajo_: Advice for AI engineers A small Visual Language Model fine-tuned on your custom dataset is as accurate as GPT-5... ... …

Submit Feedback