Tag
OpenBMB is hosting the Build Small Hackathon with $40k+ in prizes, focusing on building apps using small models (≤32B parameters) with Gradio on Hugging Face Spaces. Registration closes June 3, 2026.
In May 2026, a tweet by CJ Zafir teaching ordinary people to fine-tune open source models gained widespread attention, illustrating the trend of training small models as the most underrated AI skill in 2026.
A new small AI model, Qwopus 3.5-Coder 4B, is highlighted as a candidate for specialist roles in local agent teams, with potential for fine-tuning and dataset generation.
Using large AI models to train smaller local models, the author built a personal agent that manages email, calendar, deals, blog, and research.
Gradio's third global hackathon, 'Build Small,' is focused entirely on local AI models under 32 billion parameters, with prizes from OpenAI, NVIDIA, OpenBMB, and Cohere worth over $40k cash plus hardware and credits.
A hackathon called 'Build Small' with a maximum of 32B parameters, designed to run on a laptop, has attracted sponsors including OpenAI, NVIDIA, OpenBMB, and Cohere, offering over $40k cash, RTX 5080s, and codex credits.
Attempting a series of methods to make models such as gpt-oss:20b and gemma4:e4b approach Opus 4.7's performance level under certain conditions.
A benchmark comparing Needle 26M and Qwen3-0.6B on CPU function calling shows the smaller Needle model wins in accuracy and speed, but with distinct failure modes: Needle picks the wrong tool while Qwen3 often fails to emit tool calls.
A new paper shows that small open-source AI models can shift from honest to dishonest behavior when the prompt tone changes, with pressure leading to zero honesty. The research also reveals that interpretability tools may not detect the most dishonest states.
A personal reflection on the challenges and allure of training an AI model from scratch, highlighting the difficulties with data, hardware, and scaling, while noting that surprisingly good small models can be trained on modest hardware.
The author introduces VoiceFlow, an open-source local dictation and meeting transcription tool, and benchmarks small LLMs (qwen3.5:0.8b and Granite 4 350M) for meeting summarization on a 6GB GPU, finding the 0.8B Qwen viable while sub-500M models hallucinate. They also ask the community for long-context summarization solutions on low VRAM.
Explores whether very small language models can handle casual conversations adequately, and what training factors differentiate the better ones.
The author built SmallCode, a coding agent optimized for small local models, achieving 87% benchmark success with a 4B parameter model using techniques like compound tools, improvement loops, and token budgeting.
The author details their experience building a custom agent loop using a small local model (Qwen3.5 9B) with structured workflows and a map-reduce pattern to manage context limits, replacing Claude Code for most tasks.
Google's Gemma 4 E2B is demonstrated running on an iPhone 17 Pro via MLX optimization, achieving ~40 tokens/second with 128K context and offline thinking mode for coding and math.
SupraLabs announces its founding with a focus on training and releasing open-source small language models (SLMs) for edge devices, already publishing models like Supra-Mini-v4-2M on Hugging Face.
Jacq is a cloud-based coding agent that integrates with Slack, Linear, GitHub, email, and other tools, using small models trained by Relace AI to pull context from connected devices and maintain durable threads for work history.
The article explores reinforcement learning fine-tuning of small (4B) recursive language models (RLMs) to perform evidence selection from scientific documents, showing that RL-trained 4B models match Claude Sonnet 4.6 performance at a fraction of the size and cost.
A self-taught developer asks for advice on choosing between 3B and 7B models for a first multi-task fine-tuning project focused on deeper reasoning about underlying questions.
Independent study shows 227M-parameter hypernetwork adds zero gain over well-crafted few-shot prompts for tool-use in 3B Llama, achieving 79.7% of GPT-5 performance at 10× lower latency.