Tag
DeepSeek-OCR is a 3B vision model using context optical compression for efficient document processing. Fine-tuning it on Persian text using Unsloth achieved an 88.26% improvement in character error rate, all open-source and runnable on a single GPU.
This pull request adds support for the Granite4 Vision model to llama.cpp, an open-source LLM inference engine.
Kapa.ai describes their approach to indexing images for RAG by using a cheap vision model to generate text descriptions at indexing time, avoiding query-time vision costs, resulting in better answers with minimal per-query overhead.
Stepfun 3.7 Flash is a compact vision model that achieves aesthetics close to GLM 5.1 and 80% of its 3D world understanding, while using only 25% of the parameters, making it highly RAM-efficient.
OlmoEarth v1.1 is a new family of satellite imagery analysis models from Allen AI that reduces compute costs by up to 3x while maintaining performance, achieved by decreasing token sequence lengths in transformer-based models.
A demo shows Qwen3.6 35B vision model running across two M5 Max MacBook Pros connected via RDMA over Thunderbolt 5, achieving near-instant responses with prefix caching. The model correctly identifies Apple Park but misidentifies a person in the image.
Anthropic Labs has launched Claude Design, a new product powered by the Claude Opus 4.7 vision model that allows users to collaborate with AI to create visual designs, prototypes, and presentations.