open-source-llm

#open-source-llm

@MaximeRivest: Tool calling in open source LLMs is wildly different from one model to another. I just wipped up: http://chattemplatepl…

X AI KOLs Following ↗ · 2026-06-03 Cached

A new web tool, Chat Template Playground, lets users visualize how different open-source LLMs render their chat templates, highlighting differences in prompting and tokenization.

0 favorites 0 likes

#open-source-llm

Split my agent into a cheap router model and a premium synthesis model, bill dropped about 75%

Reddit r/AI_Agents ↗ · 2026-05-19

A developer splits their AI agent's LLM calls into a cheap router model (GPT-OSS 120B) for tool-picking and a premium model (gpt-5.4) for synthesis, cutting costs by ~78% while maintaining output quality.

0 favorites 0 likes

#open-source-llm

Drastically improve prompt processing speed for --n-cpu-moe partially offloaded models

Reddit r/LocalLLaMA ↗ · 2026-05-12

The article shares a performance optimization trick for llama.cpp, showing that increasing the micro-batch size (`-ub`) combined with partial CPU offloading (`--n-cpu-moe`) can drastically improve prompt processing speed for large models like gpt-oss-120b on consumer GPUs.

0 favorites 0 likes

open-source-llm

@MaximeRivest: Tool calling in open source LLMs is wildly different from one model to another. I just wipped up: http://chattemplatepl…

Split my agent into a cheap router model and a premium synthesis model, bill dropped about 75%

Drastically improve prompt processing speed for --n-cpu-moe partially offloaded models

Submit Feedback