Tag
Describes a two-layer small LLM architecture: a local always-on agent (Raven) on an RTX5080 and an online reasoning stack (Trinity Cortex) with three small models and a knowledge graph, arguing that small models are better than large frontier models for graph-based reasoning.
The article reports initial manual results from experiments testing procedural skill transfer in small AI models, providing insights into how skills can be transferred across models.
Built an agent harness for small models, enabling Qwen 3.5 4b to manage servers.
Proposes a blind visual paradigm using Three.js to test if procedural scaffolds extracted from large models can improve small model outputs without fine-tuning, validated by a blind judge model.
A news article discussing Google's continued commitment to small AI models for code generation, despite the industry trend toward larger models.
ObviousBench is a new benchmark designed specifically for evaluating smaller AI models.
Microsoft AI Frontiers releases a family of browser agents that can fill forms and make reservations using pixel-to-action with observe-think-act loops. Available in 4B, 9B, and 27B parameter sizes for deployment on modest hardware.
A Reddit user discusses the potential of small local language models (1B-4B parameters) for automation and scripting, and asks for resources focused on this use case.
Zone of Proximal Policy Optimization (ZPPO) improves knowledge distillation by using reformulated prompts that help students learn from both correct and incorrect responses, enhancing performance especially at smaller model sizes.
CacheRL trains small agent foundation models for multi-step tool-calling tasks, achieving 92% process accuracy (approaching GPT-5's 94%) with 100x less compute using cached rollouts and hybrid reward shaping, with innovations in knowledge transfer, cache-aware rewards, and iterative SFT/GRPO training.
Apodex releases open-weight small models (0.8B, 2B, 4B) specialized for agentic verification tasks, along with the AgentHarness evaluation framework for local agent workflows.
TechCrunch reports on a potential industry shift as companies consider switching to cheaper, smaller AI models instead of always using the most powerful ones, driven by escalating costs. Predictions like Brian Armstrong's suggest 80% of workloads could run on 99% cheaper models within 12-18 months, which would significantly impact major AI labs like OpenAI and Anthropic.
A developer tested how small edge models (LFM2.5, Gemma variants) retain a single fact across conversation turns, finding that models often confidently deny knowing information that remains in context, posing a trust issue for agent architectures and suggesting a trade-off between memory and format discipline.
Observation that there is high demand for small AI models, as seen in the top downloads of Qwen models under 9B parameters.
A field report on building a multi-model finance drama game where each agent runs on a different lab's small model, demonstrating the engineering challenges and benefits of model heterogeneity.
A developer argues that the edge AI community overlooks small, specialized models that can run locally on devices like smartphones, using a self-built offline Morse code recognition feature as an example. The project uses a sub-5 MB AI model with TensorFlow/Keras and LiteRT, and the entire pipeline from data generation to mobile integration was custom-built.
Google's Gemma 4 12B introduces an encoder-free multimodal architecture that competes with larger models, though benchmark comparisons show it trailing Qwen 2.5 9B on most tasks. The article also covers related developments including open-weight model security risks, Uber's Claude Code spending caps, and NeurIPS's misuse of an uncalibrated AI detector.
A guide explaining how to make agentic workflows up to 462x cheaper by compiling fixed procedures into smaller fine-tuned models instead of repeatedly prompting frontier models.
AethexAI, founded by ex-Goldman and Meta employees, raised $3M to build voice AI for African and Middle Eastern markets, using small models to reduce latency and launching its platform with APIs and SDKs.
A demonstration shows that Qwen3.6 35B A3B combined with NVIDIA's LocateAnything-3B as a vision tool can accurately fill out a paper form by detecting field positions, proving that small models can collaborate to accomplish tasks beyond a single large model's capability.