Krea 2 is a 12-billion parameter text-to-image diffusion model released open-weight on Hugging Face, with Raw (base) and Turbo (post-trained) checkpoints available.
PP-OCRv6 is a lightweight OCR model (34.5M parameters) that challenges large VLMs with its MetaFormer architecture, offering efficient text detection and recognition across multiple deployment scenarios.
Asks for recommendations on affordable AI models for content writing, image generation, and vibe coding.
Mistral AI releases Mistral OCR 4, a compact document intelligence model that provides bounding boxes, block classification, and inline confidence scores for structured text extraction. It supports 170 languages, runs in a single container for self-hosted deployment, and integrates with the Mistral Search Toolkit for enterprise search and RAG pipelines.
This post shows how to serve Baidu's Unlimited-OCR model as a temporary, OpenAI-compatible endpoint on Hugging Face Jobs, enabling multi-page document parsing with features like table-to-HTML and equation-to-LaTeX extraction.
Baidu releases Unlimited-OCR, an open-source model for one-shot long-horizon document parsing, building upon Deepseek-OCR with support for single images, multi-page documents, and PDFs.
A promotional video showcasing the capabilities of Seedance 2.5, an AI video generation model.
NVIDIA quietly released Nemotron-3.5-ASR, a lightweight 0.6B parameter open-source speech recognition model designed for real-time streaming with support for 40+ languages, low latency, and cache-aware architecture.
The author praises GLM-5.2, an MIT open-weights model, for its exceptional real-world performance in human evaluation benchmarks, claiming it rivals the best closed-source models like those from Claude.
Introduces SAG (SQL-Retrieval Augmented Generation), a novel retrieval-augmented generation architecture based on SQL dynamic hyperedges. It is more efficient and lower cost for multi-hop reasoning compared to traditional RAG and GraphRAG. It is open-sourced on GitHub and has achieved good evaluation results.
GLM 5.2 demonstrates fast performance on Modal's cloud platform.
A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.
ByteDance has released Seed2.1, a new AI model, with accompanying blog post and model card.
NousResearch's Hermes Agent team celebrates processing a trillion tokens per day and achieving 200k GitHub stars, highlighting their continuous improvement in AI agent development.
Boogu has released a series of open-source unified image generation and editing models, including Base, Turbo, and Edit variants.
Baidu has released Unlimited-OCR, which processes entire documents in a single pass without chunking, overcoming a major limitation of current OCR technology.
Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).
YOLO26 is a multi-task computer vision model family released in January 2026, featuring end-to-end detection without Non-Maximum Suppression for lower latency and optimized for edge deployment with improved CPU inference and compact design.
Baidu has open-sourced the Unlimited OCR model, which uses the R-SWA attention mechanism to process hundreds of pages in a single pass without page splitting, with a constant KV Cache. The model innovatively mimics the attention pattern of humans copying books by hand and shares technical lineage with DeepSeek OCR, sparking discussions about talent mobility.
Anthropic partner provider shows slug for upcoming Claude Sonnet 5 model, hinting at imminent release.