Tag
A developer shares surprising lessons from fine-tuning a small open model, including that base models often already max out on intended improvements, the real weakness is behavior (caving), and fine-tuning requires careful measurement and balancing.
The open-source OCR model Unlimited OCR, based on DeepSeek OCR, achieves 93.23 on OmniDocBench v1.5 with only 3B parameters, outperforming DeepSeek OCR, Gemini 2.5, and others.
Surya 2 is a 650M parameter OCR model achieving 83.3% on olmocr, claiming to be the most accurate small OCR model thanks to character tokenization which improves accuracy and model size.
Nanbeige 4.1, a 3B model from Chinese Indeed, outperforms larger Qwen models on tasks requiring 600+ tool calls.
A 600M parameter reasoning model trained using SYNTH reportedly outperforms a 397B model and Sonnet 4.5 in an industrial application for the Paris subway, highlighting the effectiveness of small, specialized models.
Inflect-Nano, an ultra-extreme tiny 4.63 million parameter text-to-speech model, has been released.
A 4 billion parameter open model from the Apodex family outperforms 30 billion parameter models on web research benchmarks, attributed to careful training data and self-verification techniques rather than raw scale, suggesting a more democratic trajectory for AI capability.
VibeThinker-3B is a small 3B parameter model that achieves performance comparable to ~30B parameter models on the MathQA benchmark, demonstrating significant efficiency.
WeiboAI released VibeThinker-3B, a small 3B reasoning model tested locally on coding tasks, achieving 3/3 on algorithm problems.
Introduces Glimmer, a 10,000 parameter language model trained on 500K tokens of FineWeb-Edu with a standard Llama architecture, available on HuggingFace.
A 3B model, VibeThinker-3B, achieves highly competitive results on verifiable reasoning tasks through post-training refinements on Qwen2.5-Coder, including curriculum SFT, multi-domain RL, offline self-distillation, and a final RL-based instruct stage.
Google releases Gemma 3, a 270M parameter language model.
A user discusses building a small autocomplete model (25M parameters) as a learning project, mentions hardware constraints (32GB VRAM), data requirements (~100M tokens), and seeks advice on datasets and data formatting for autocomplete-style training.
The author tests the Apodex 4B-SFT and 35B mini models, finding the 4B-SFT surpasses other 4B models in multi-hop search tasks without hallucination, and notes the design philosophy of separating answer checking from generation.
This model is small, cost-effective, open-source (Apache 2.0), and locally deployable, representing a shift towards transparent and sovereign AI.
A 4B open-source model beats Mythos 5 on the CharXiv chart understanding benchmark, showing strong performance from a freely available small model.
Cohere released North Mini Code, its first open-source coding model under Apache 2.0, designed to be small, cost-effective, locally deployable, and focused on agentic performance.
CJ Zafir's team has introduced Mac-1, a 6.6B-parameter small model that runs locally on Mac with only 7GB of RAM. It can chain-call 487 Mac-native tools, with an inference speed of 65 tok/s, aiming to disrupt the cloud-based large model-dominated Agent paradigm.
Google releases Gemma 4 12B, a compact AI model optimized for local laptop use with only 16GB of RAM, featuring multi-token prediction and streamlined multimodal capabilities for text, audio, and images.
We just launched Gemma 4 12B, a mid-sized multimodal model with native audio inputs, requiring only 16GB memory and released under Apache 2.0.