Tag
A developer splits their AI agent's LLM calls into a cheap router model (GPT-OSS 120B) for tool-picking and a premium model (gpt-5.4) for synthesis, cutting costs by ~78% while maintaining output quality.
This paper tests whether varying inference-time reasoning effort affects the alignment between large reasoning models' chain-of-thought lengths and human reaction times. Results show alignment is invariant to effort perturbations, suggesting it is a training-time achievement.
A tweet comparing Qwen3.6 27B and 35B-A3B models to GPT-OSS, noting that while Qwen models are fast, GPT-OSS is more efficient, especially in prefill performance.