visual-language-model

Tag

Cards List
#visual-language-model

@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…

X AI KOLs Timeline · 17h ago Cached

Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).

0 favorites 0 likes
#visual-language-model

Unification of Closed-Open Industrial Detection Scenarios: New Large-Scale Benchmarks,Challenges and Baselines

arXiv cs.AI · 2026-06-09 Cached

Introduces MMIOC-1M, a large-scale multi-modal benchmark for industrial defect detection, and proposes RTVPNet, a refined text-visual prompt network achieving state-of-the-art performance.

0 favorites 0 likes
#visual-language-model

@paulabartabajo_: Advice for AI engineers A small Visual Language Model fine-tuned on your custom dataset is as accurate as GPT-5... ... …

X AI KOLs Timeline · 2026-04-22 Cached

A tweet claims that a small visual language model fine-tuned on custom data can match GPT-5 accuracy while costing 50× less, citing Liquid AI’s 1.6B model running locally with llama.cpp.

0 favorites 0 likes
← Back to home

Submit Feedback