What reasoning model are you actually running in production?

Reddit r/AI_Agents 05/12/26, 03:08 PM News

Summary

A practitioner seeks real-world feedback on reasoning models like o3, Claude extended thinking, Gemini 2.5 Pro, and Ring 2.6 1T for production agent tasks, questioning the practical performance of Ring's dual-reasoning-effort modes versus benchmarks.

I need to pick a reasoning model for production agent work. The usual suspects are obvious (o3, Claude extended thinking, Gemini 2.5 Pro), but I'm also looking at Ring 2.6 1T, which has two reasoning effort modes — high for fast multi-step agent loops and xhigh for harder problems. The dual-mode approach appeals to me because not every agent call needs maximum reasoning depth. But I can't find much real-world feedback on it. The benchmarks exist (PinchBench 87.60, Tau2-Bench Telecom 95.32) but I don't trust benchmarks to tell me how it handles real multi-step agent tasks with messy intermediate states. How does the high/xhigh split work in practice is the speed difference noticeable? Does it stay stable on longer agent runs?

Original Article

Similar Articles

Reasoning models struggle to control their chains of thought, and that’s good

OpenAI Blog

OpenAI researchers study whether reasoning models can deliberately obscure their chain-of-thought to evade monitoring, finding that current models struggle to control their reasoning even when aware of monitoring. They introduce CoT-Control, an open-source evaluation suite with over 13,000 tasks to measure chain-of-thought controllability in reasoning models.

OpenAI o3-mini

OpenAI Blog

OpenAI releases o3-mini, a cost-efficient reasoning model with strong STEM capabilities, available in ChatGPT and API with support for function calling, structured outputs, and three reasoning effort levels. The model matches o1 performance in math and coding while being faster and cheaper, with free plan users gaining access to a reasoning model for the first time.

What reasoning model are you actually running in production?

Similar Articles

Reasoning models struggle to control their chains of thought, and that’s good

OpenAI o3-mini

First time fine-tuning, need a sanity check — 3B or 7B for multi-task reasoning? [D]

Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning

Economics and reasoning with OpenAI o1

Submit Feedback