Models

Cards List

@vanstriendaniel: It's raining OCR models again! @Baidu_Inc's Unlimited-OCR is one of the more interesting. You can try it without much e…

X AI KOLs Following · 3h ago Cached

This post shows how to serve Baidu's Unlimited-OCR model as a temporary, OpenAI-compatible endpoint on Hugging Face Jobs, enabling multi-page document parsing with features like table-to-HTML and equation-to-LaTeX extraction.

0 favorites 0 likes

Unlimited OCR: One-Shot Long-Horizon Parsing

Hacker News Top · 4h ago Cached

Baidu releases Unlimited-OCR, an open-source model for one-shot long-horizon document parsing, building upon Deepseek-OCR with support for single images, multi-page documents, and PDFs.

0 favorites 0 likes

Seedance 2.5 Promotional Video

Reddit r/singularity · 5h ago

A promotional video showcasing the capabilities of Seedance 2.5, an AI video generation model.

0 favorites 0 likes

@DataChaz: @NVIDIA just quietly dropped an incredibly impressive speech recognition model that completely changes the math for loc…

X AI KOLs Timeline · 6h ago Cached

NVIDIA quietly released Nemotron-3.5-ASR, a lightweight 0.6B parameter open-source speech recognition model designed for real-time streaming with support for 40+ languages, low latency, and cache-aware architecture.

0 favorites 0 likes

Human Evaluation of GLM-5.2

Reddit r/LocalLLaMA · 8h ago

The author praises GLM-5.2, an MIT open-weights model, for its exceptional real-world performance in human evaluation benchmarks, claiming it rivals the best closed-source models like those from Claude.

0 favorites 0 likes

@aikangarooking: https://x.com/aikangarooking/status/2069325659105861926

X AI KOLs Timeline · 8h ago Cached

Introduces SAG (SQL-Retrieval Augmented Generation), a novel retrieval-augmented generation architecture based on SQL dynamic hyperedges. It is more efficient and lower cost for multi-hop reasoning compared to traditional RAG and GraphRAG. It is open-sourced on GitHub and has achieved good evaluation results.

0 favorites 0 likes

@charles_irl: GLM 5.2 runs pretty fast on Modal.

X AI KOLs Following · 9h ago Cached

GLM 5.2 demonstrates fast performance on Modal's cloud platform.

0 favorites 0 likes

Is there any reason for a lack of love for Gemma 4 26b?

Reddit r/LocalLLaMA · 10h ago

A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.

0 favorites 0 likes

Seed2.1 released

Reddit r/singularity · 11h ago

ByteDance has released Seed2.1, a new AI model, with accompanying blog post and model card.

0 favorites 0 likes

@theemozilla: A trillion tokens a day and 200k GitHub stars Very proud of the Hermes Agent team and what we've built at @NousResearch…

X AI KOLs Following · 11h ago Cached

NousResearch's Hermes Agent team celebrates processing a trillion tokens per day and achieving 200k GitHub stars, highlighting their continuous improvement in AI agent development.

0 favorites 0 likes

Boogu Base, Turbo, Edit - open-source unified image generation and editing model series

Reddit r/LocalLLaMA · 12h ago

Boogu has released a series of open-source unified image generation and editing models, including Base, Turbo, and Edit variants.

0 favorites 0 likes

@ErickSky: Baidu has just broken one of the biggest limitations of current OCR. Unlimited-OCR processes entire documents in a sing…

X AI KOLs Timeline · 13h ago Cached

Baidu has released Unlimited-OCR, which processes entire documents in a single pass without chunking, overcoming a major limitation of current OCR technology.

0 favorites 0 likes

@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…

X AI KOLs Timeline · 13h ago Cached

Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).

0 favorites 0 likes

An Introduction to YOLO26

Hacker News Top · 14h ago Cached

YOLO26 is a multi-task computer vision model family released in January 2026, featuring end-to-end detection without Non-Maximum Suppression for lower latency and optimized for edge deployment with improved CPU inference and compact design.

0 favorites 0 likes

@berryxia: Wow, this move directly poached DeepSeek's talent! Last night I saw this interesting OCR open-source model on HuggingFace and the fascinating story behind it. This OCR model is completely different from traditional ones! Its speed and accuracy are absolutely unbeatable~~ Let me start with some background, for those who are familiar…

X AI KOLs Timeline · 15h ago Cached

Baidu has open-sourced the Unlimited OCR model, which uses the R-SWA attention mechanism to process hundreds of pages in a single pass without page splitting, with a constant KV Cache. The model innovatively mimics the attention pattern of humans copying books by hand and shares technical lineage with DeepSeek OCR, sparking discussions about talent mobility.

0 favorites 0 likes

claude-sonnet-5 (1 minute read)

TLDR AI · 15h ago Cached

Anthropic partner provider shows slug for upcoming Claude Sonnet 5 model, hinting at imminent release.

0 favorites 0 likes

GLM-5.2 Raises the Bar for Open Models (14 minute read)

TLDR AI · 15h ago Cached

GLM-5.2 is a new open-source AI model that sets a high bar for open models, though it still trails proprietary frontier models and lacks some features like vision.

0 favorites 0 likes

Alibaba's AI video model rises to No. 2 in global rankings, as OpenAI's Sora and ByteDance's Seedance fall away (14 minute read)

TLDR AI · 15h ago Cached

Alibaba released HappyHorse 1.1, a major AI video generation model upgrade now available via API, rising to No. 2 in global rankings as competitors Sora and Seedance faltered.

0 favorites 0 likes

@rohanpaul_ai: Sakana Fugu Ultra just beat the other models on visual polish in a live trading-desk coding test, got close to GLM 5.2,…

X AI KOLs Following · 17h ago Cached

Sakana's Fugu Ultra model orchestration system outperformed other models in a live coding test for a trading desk UI, though at 17x higher cost, demonstrating its strength in visual polish and multi-agent coordination.

0 favorites 0 likes

Breaking the Transformer Dead-End: A Local-First 3D Point-Cloud Cognition Engine running on consumer hardware

Reddit r/artificial · 18h ago

Introduces SHD-CCP v2.0, a novel AI architecture that replaces transformer token sequences with 3D point cloud data structures using Grassmannian manifold fusion and zero-copy memory-mapped streaming, achieving low latency and memory footprint on consumer hardware.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback