Baidu releases Unlimited-OCR, an open-source model for one-shot long-horizon document parsing, building upon Deepseek-OCR with support for single images, multi-page documents, and PDFs.
A promotional video showcasing the capabilities of Seedance 2.5, an AI video generation model.
NVIDIA quietly released Nemotron-3.5-ASR, a lightweight 0.6B parameter open-source speech recognition model designed for real-time streaming with support for 40+ languages, low latency, and cache-aware architecture.
The author praises GLM-5.2, an MIT open-weights model, for its exceptional real-world performance in human evaluation benchmarks, claiming it rivals the best closed-source models like those from Claude.
Introduces SAG (SQL-Retrieval Augmented Generation), a novel retrieval-augmented generation architecture based on SQL dynamic hyperedges. It is more efficient and lower cost for multi-hop reasoning compared to traditional RAG and GraphRAG. It is open-sourced on GitHub and has achieved good evaluation results.
GLM 5.2 demonstrates fast performance on Modal's cloud platform.
A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.
ByteDance has released Seed2.1, a new AI model, with accompanying blog post and model card.
NousResearch's Hermes Agent team celebrates processing a trillion tokens per day and achieving 200k GitHub stars, highlighting their continuous improvement in AI agent development.
Boogu has released a series of open-source unified image generation and editing models, including Base, Turbo, and Edit variants.
Baidu has released Unlimited-OCR, which processes entire documents in a single pass without chunking, overcoming a major limitation of current OCR technology.
Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).
YOLO26 is a multi-task computer vision model family released in January 2026, featuring end-to-end detection without Non-Maximum Suppression for lower latency and optimized for edge deployment with improved CPU inference and compact design.
Baidu has open-sourced the Unlimited OCR model, which uses the R-SWA attention mechanism to process hundreds of pages in a single pass without page splitting, with a constant KV Cache. The model innovatively mimics the attention pattern of humans copying books by hand and shares technical lineage with DeepSeek OCR, sparking discussions about talent mobility.
Anthropic partner provider shows slug for upcoming Claude Sonnet 5 model, hinting at imminent release.
GLM-5.2 is a new open-source AI model that sets a high bar for open models, though it still trails proprietary frontier models and lacks some features like vision.
Alibaba released HappyHorse 1.1, a major AI video generation model upgrade now available via API, rising to No. 2 in global rankings as competitors Sora and Seedance faltered.
Sakana's Fugu Ultra model orchestration system outperformed other models in a live coding test for a trading desk UI, though at 17x higher cost, demonstrating its strength in visual polish and multi-agent coordination.
Introduces SHD-CCP v2.0, a novel AI architecture that replaces transformer token sequences with 3D point cloud data structures using Grassmannian manifold fusion and zero-copy memory-mapped streaming, achieving low latency and memory footprint on consumer hardware.
An updated GPT-5.5 Cyber model surpasses Mythos 5 in the CyberGym benchmark.