Models

Cards List

Krea 2 released on Hugging Face

Reddit r/LocalLLaMA · 1h ago Cached

Krea 2 is a 12-billion parameter text-to-image diffusion model released open-weight on Hugging Face, with Raw (base) and Turbo (post-trained) checkpoints available.

0 favorites 0 likes

@PaddlePaddle: PP-OCRv6 Tech Deep Dive Ep.1: In the Era of Large Models, Why Does Lightweight OCR Still Have Irreplaceable Value? — PP…

X AI KOLs Timeline · 2h ago Cached

PP-OCRv6 is a lightweight OCR model (34.5M parameters) that challenges large VLMs with its MetaFormer architecture, offering efficient text detection and recognition across multiple deployment scenarios.

0 favorites 0 likes

Best cheap model for content writing, realistic image generation & vibe coding?

Reddit r/AI_Agents · 2h ago

Asks for recommendations on affordable AI models for content writing, image generation, and vibe coding.

0 favorites 0 likes

Mistral OCR 4

Hacker News Top · 3h ago Cached

Mistral AI releases Mistral OCR 4, a compact document intelligence model that provides bounding boxes, block classification, and inline confidence scores for structured text extraction. It supports 170 languages, runs in a single container for self-hosted deployment, and integrates with the Mistral Search Toolkit for enterprise search and RAG pipelines.

0 favorites 0 likes

@vanstriendaniel: It's raining OCR models again! @Baidu_Inc's Unlimited-OCR is one of the more interesting. You can try it without much e…

X AI KOLs Following · 4h ago Cached

This post shows how to serve Baidu's Unlimited-OCR model as a temporary, OpenAI-compatible endpoint on Hugging Face Jobs, enabling multi-page document parsing with features like table-to-HTML and equation-to-LaTeX extraction.

0 favorites 0 likes

Unlimited OCR: One-Shot Long-Horizon Parsing

Hacker News Top · 5h ago Cached

Baidu releases Unlimited-OCR, an open-source model for one-shot long-horizon document parsing, building upon Deepseek-OCR with support for single images, multi-page documents, and PDFs.

0 favorites 0 likes

Seedance 2.5 Promotional Video

Reddit r/singularity · 6h ago

A promotional video showcasing the capabilities of Seedance 2.5, an AI video generation model.

0 favorites 0 likes

@DataChaz: @NVIDIA just quietly dropped an incredibly impressive speech recognition model that completely changes the math for loc…

X AI KOLs Timeline · 7h ago Cached

NVIDIA quietly released Nemotron-3.5-ASR, a lightweight 0.6B parameter open-source speech recognition model designed for real-time streaming with support for 40+ languages, low latency, and cache-aware architecture.

0 favorites 0 likes

Human Evaluation of GLM-5.2

Reddit r/LocalLLaMA · 9h ago

The author praises GLM-5.2, an MIT open-weights model, for its exceptional real-world performance in human evaluation benchmarks, claiming it rivals the best closed-source models like those from Claude.

0 favorites 0 likes

@aikangarooking: https://x.com/aikangarooking/status/2069325659105861926

X AI KOLs Timeline · 9h ago Cached

Introduces SAG (SQL-Retrieval Augmented Generation), a novel retrieval-augmented generation architecture based on SQL dynamic hyperedges. It is more efficient and lower cost for multi-hop reasoning compared to traditional RAG and GraphRAG. It is open-sourced on GitHub and has achieved good evaluation results.

0 favorites 0 likes

@charles_irl: GLM 5.2 runs pretty fast on Modal.

X AI KOLs Following · 11h ago Cached

GLM 5.2 demonstrates fast performance on Modal's cloud platform.

0 favorites 0 likes

Is there any reason for a lack of love for Gemma 4 26b?

Reddit r/LocalLLaMA · 12h ago

A user asks why Gemma 4 26b receives less attention compared to Qwen models, sharing their experience using these models for a personal assistant project on a 3090.

0 favorites 0 likes

Seed2.1 released

Reddit r/singularity · 12h ago

ByteDance has released Seed2.1, a new AI model, with accompanying blog post and model card.

0 favorites 0 likes

@theemozilla: A trillion tokens a day and 200k GitHub stars Very proud of the Hermes Agent team and what we've built at @NousResearch…

X AI KOLs Following · 12h ago Cached

NousResearch's Hermes Agent team celebrates processing a trillion tokens per day and achieving 200k GitHub stars, highlighting their continuous improvement in AI agent development.

0 favorites 0 likes

Boogu Base, Turbo, Edit - open-source unified image generation and editing model series

Reddit r/LocalLLaMA · 13h ago

Boogu has released a series of open-source unified image generation and editing models, including Base, Turbo, and Edit variants.

0 favorites 0 likes

@ErickSky: Baidu has just broken one of the biggest limitations of current OCR. Unlimited-OCR processes entire documents in a sing…

X AI KOLs Timeline · 14h ago Cached

Baidu has released Unlimited-OCR, which processes entire documents in a single pass without chunking, overcoming a major limitation of current OCR technology.

0 favorites 0 likes

@geekbb: Baidu's open-source visual language model OCR project, upgraded from DeepSeek-OCR, focuses on one-shot parsing of extremely long documents. The model has two inference modes: 'gundam' mode for dense text in a single image, and 'base' mode for multi-page or PDF processing. https://github…

X AI KOLs Timeline · 14h ago Cached

Baidu has open-sourced the visual language model Unlimited-OCR, upgraded from DeepSeek-OCR, supporting one-shot parsing of extremely long documents, offering two inference modes: gundam (dense text in a single image) and base (multi-page/PDF).

0 favorites 0 likes

An Introduction to YOLO26

Hacker News Top · 15h ago Cached

YOLO26 is a multi-task computer vision model family released in January 2026, featuring end-to-end detection without Non-Maximum Suppression for lower latency and optimized for edge deployment with improved CPU inference and compact design.

0 favorites 0 likes

@berryxia: Wow, this move directly poached DeepSeek's talent! Last night I saw this interesting OCR open-source model on HuggingFace and the fascinating story behind it. This OCR model is completely different from traditional ones! Its speed and accuracy are absolutely unbeatable~~ Let me start with some background, for those who are familiar…

X AI KOLs Timeline · 17h ago Cached

Baidu has open-sourced the Unlimited OCR model, which uses the R-SWA attention mechanism to process hundreds of pages in a single pass without page splitting, with a constant KV Cache. The model innovatively mimics the attention pattern of humans copying books by hand and shares technical lineage with DeepSeek OCR, sparking discussions about talent mobility.

0 favorites 0 likes

claude-sonnet-5 (1 minute read)

TLDR AI · 17h ago Cached

Anthropic partner provider shows slug for upcoming Claude Sonnet 5 model, hinting at imminent release.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback