small-models

#small-models

Small LLM Architecture: Raven Agent (Local RTX5080) + Trinity Cortex (7B/13B/MoE Online)

Reddit r/ArtificialInteligence ↗ · 4h ago

Describes a two-layer small LLM architecture: a local always-on agent (Raven) on an RTX5080 and an online reasoning stack (Trinity Cortex) with three small models and a knowledge graph, arguing that small models are better than large frontier models for graph-based reasoning.

0 favorites 0 likes

#small-models

Update: First Manual Results from Testing Procedural Skill Transfer in Small Models

Reddit r/LocalLLaMA ↗ · 13h ago

The article reports initial manual results from experiments testing procedural skill transfer in small AI models, providing insights into how skills can be transferred across models.

0 favorites 0 likes

#small-models

I built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers.

Reddit r/LocalLLaMA ↗ · 18h ago

Built an agent harness for small models, enabling Qwen 3.5 4b to manage servers.

0 favorites 0 likes

#small-models

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

Reddit r/LocalLLaMA ↗ · yesterday

Proposes a blind visual paradigm using Three.js to test if procedural scaffolds extracted from large models can improve small model outputs without fine-tuning, validated by a blind judge model.

0 favorites 0 likes

#small-models

Even Google still believes in small models for coding.

Reddit r/LocalLLaMA ↗ · 2d ago

A news article discussing Google's continued commitment to small AI models for code generation, despite the industry trend toward larger models.

0 favorites 0 likes

#small-models

New bench designed for smaller models: ObviousBench.com

Reddit r/LocalLLaMA ↗ · 2d ago

ObviousBench is a new benchmark designed specifically for evaluating smaller AI models.

0 favorites 0 likes

#small-models

@ms_aifrontiers: What it is: a family of agents that do real browser work like filling out forms and making reservations. Pixel-to-actio…

X AI KOLs Following ↗ · 5d ago

Microsoft AI Frontiers releases a family of browser agents that can fill forms and make reservations using pixel-to-action with observe-think-act loops. Available in 4B, 9B, and 27B parameter sizes for deployment on modest hardware.

0 favorites 0 likes

#small-models

Are small local models for automation a thing?

Reddit r/LocalLLaMA ↗ · 2026-06-16

A Reddit user discusses the potential of small local language models (1B-4B parameters) for automation and scripting, and asks for resources focused on this use case.

0 favorites 0 likes

#small-models

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Hugging Face Daily Papers ↗ · 2026-06-16 Cached

Zone of Proximal Policy Optimization (ZPPO) improves knowledge distillation by using reformulated prompts that help students learn from both correct and incorrect responses, enhancing performance especially at smaller model sizes.

0 favorites 0 likes

#small-models

CacheRL:Multi-Turn Tool-Calling Agents via Cached Rollouts and Hybrid Reward

arXiv cs.CL ↗ · 2026-06-15 Cached

CacheRL trains small agent foundation models for multi-step tool-calling tasks, achieving 92% process accuracy (approaching GPT-5's 94%) with 100x less compute using cached rollouts and hybrid reward shaping, with innovations in knowledge transfer, cache-aware rewards, and iterative SFT/GRPO training.

0 favorites 0 likes

#small-models

Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) optimized for Agentic Verification + AgentHarness Evals

Reddit r/LocalLLaMA ↗ · 2026-06-10

Apodex releases open-weight small models (0.8B, 2B, 4B) specialized for agentic verification tasks, along with the AgentHarness evaluation framework for local agent workflows.

0 favorites 0 likes

#small-models

Can tech companies learn to love cheaper AI models?

TechCrunch AI ↗ · 2026-06-09 Cached

TechCrunch reports on a potential industry shift as companies consider switching to cheaper, smaller AI models instead of always using the most powerful ones, driven by escalating costs. Predictions like Brian Armstrong's suggest 80% of workloads could run on 99% cheaper models within 12-18 months, which would significantly impact major AI labs like OpenAI and Anthropic.

0 favorites 0 likes

#small-models

Tested how long small models hold a fact across a conversation. The memory failure mode is a real problem for agents, and it's not what I expected.

Reddit r/AI_Agents ↗ · 2026-06-08

A developer tested how small edge models (LFM2.5, Gemma variants) retain a single fact across conversation turns, finding that models often confidently deny knowing information that remains in context, posing a trust issue for agent architectures and suggesting a trade-off between memory and format discipline.

0 favorites 0 likes

#small-models

@LottoLabs: There’s so much demand for a good small model, look at top downloaded qwen models All < 9b

X AI KOLs Following ↗ · 2026-06-08 Cached

Observation that there is high demand for small AI models, as seen in the top downloads of Qwen models under 9B parameters.

0 favorites 0 likes

#small-models

Five labs, five minds: building a multi-model finance drama on small models (6 minute read)

TLDR AI ↗ · 2026-06-08 Cached

A field report on building a multi-model finance drama game where each agent runs on a different lab's small model, demonstrating the engineering challenges and benefits of model heterogeneity.

0 favorites 0 likes

#small-models

Are We Underestimating Small Edge AI Models?[D]

Reddit r/MachineLearning ↗ · 2026-06-05

A developer argues that the edge AI community overlooks small, specialized models that can run locally on devices like smartphones, using a self-built offline Morse code recognition feature as an example. The project uses a sub-5 MB AI model with TensorFlow/Keras and LiteRT, and the entire pipeline from data generation to mobile integration was custom-built.

0 favorites 0 likes

#small-models

Gemma 2B multimodal model matches larger models without encoder

Reddit r/singularity ↗ · 2026-06-04

Google's Gemma 4 12B introduces an encoder-free multimodal architecture that competes with larger models, though benchmark comparisons show it trailing Qwen 2.5 9B on most tasks. The article also covers related developments including open-weight model security risks, Uber's Claude Code spending caps, and NeurIPS's misuse of an uncalibrated AI detector.

0 favorites 0 likes

#small-models

@hooeem: https://x.com/hooeem/status/2062266452921491934

X AI KOLs Timeline ↗ · 2026-06-03 Cached

A guide explaining how to make agentic workflows up to 462x cheaper by compiling fixed procedures into smaller fine-tuned models instead of repeatedly prompting frontier models.

1 favorites 1 likes

#small-models

These two founders left Goldman and Meta to build voice AI for markets everyone else overlooked

TechCrunch AI ↗ · 2026-06-03 Cached

AethexAI, founded by ex-Goldman and Meta employees, raised $3M to build voice AI for African and Middle Eastern markets, using small models to reduce latency and launching its platform with APIs and SDKs.

0 favorites 0 likes

#small-models

@stevibe: Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model …

X AI KOLs Timeline ↗ · 2026-06-02 Cached

A demonstration shows that Qwen3.6 35B A3B combined with NVIDIA's LocateAnything-3B as a vision tool can accurately fill out a paper form by detecting field positions, proving that small models can collaborate to accomplish tasks beyond a single large model's capability.

0 favorites 0 likes

small-models

Submit Feedback