inference-cost

#inference-cost

@hooeem: https://x.com/hooeem/status/2062266452921491934

X AI KOLs Timeline ↗ · yesterday Cached

A guide explaining how to make agentic workflows up to 462x cheaper by compiling fixed procedures into smaller fine-tuned models instead of repeatedly prompting frontier models.

1 favorites 1 likes

#inference-cost

@dair_ai: NEW paper worth reading. A full agentic workflow can be distilled into model weights and run at roughly 100x lower infe…

X AI KOLs Following ↗ · 2026-05-22 Cached

This paper demonstrates that agentic workflows can be distilled into small fine-tuned models, achieving near-frontier quality while reducing inference cost by two orders of magnitude compared to orchestration approaches.

0 favorites 0 likes

#inference-cost

I ran an experiment on the 30b class of gemma4 and qwen3.5 models to try to learn about energy cost and performance tradeoffs. In other words, which models use more energy to give the same answer quality?

Reddit r/LocalLLaMA ↗ · 2026-04-21

Empirical study on four 30B-class dense and MoE models showing Gemma-4 26B MoE delivers equal accuracy at 1.9–15 Wh while dense and larger MoE variants consume up to 34 Wh for the same reasoning tasks.

0 favorites 0 likes

inference-cost

@hooeem: https://x.com/hooeem/status/2062266452921491934

@dair_ai: NEW paper worth reading. A full agentic workflow can be distilled into model weights and run at roughly 100x lower infe…

I ran an experiment on the 30b class of gemma4 and qwen3.5 models to try to learn about energy cost and performance tradeoffs. In other words, which models use more energy to give the same answer quality?

Submit Feedback