ai-reasoning

Tag

Cards List
#ai-reasoning

Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning

arXiv cs.CL · 2d ago Cached

This paper presents a full-pipeline recipe for teaching thinking models to reason with tools, achieving state-of-the-art performance on benchmarks like AIME 2025 when applied to Qwen3 models.

0 favorites 0 likes
#ai-reasoning

The best agent model is the one that knows when to stop

Reddit r/AI_Agents · 2d ago

The article argues that effective AI agents require restraint and explicit 'stop conditions' rather than endless autonomy, highlighting Ling-2.6-1T as a model suited for conservative planning roles.

0 favorites 0 likes
#ai-reasoning

Teaching AI models to say “I’m not sure”

MIT News — Artificial Intelligence · 2026-04-22 Cached

MIT CSAIL researchers introduce RLCR, a method using Brier scores in reinforcement learning to train AI models to output calibrated confidence estimates, significantly reducing overconfidence without sacrificing accuracy.

0 favorites 0 likes
#ai-reasoning

Our First Proof submissions

OpenAI Blog · 2026-02-20 Cached

OpenAI submitted proof attempts for the First Proof challenge, a research-level math competition testing whether AI can produce correct, checkable proofs. The company's internal model successfully solved at least five of the ten problems, demonstrating significant progress in sustained reasoning and rigorous mathematical thinking.

0 favorites 0 likes
#ai-reasoning

Gemini achieves gold-medal level at the International Collegiate Programming Contest World Finals

Google DeepMind Blog · 2025-10-24 Cached

Gemini 2.5 Deep Think achieved gold-medal level performance at the 2025 International Collegiate Programming Contest World Finals, solving 10 of 12 problems in the five-hour competition, demonstrating significant advances in abstract reasoning and problem-solving capabilities.

0 favorites 0 likes
#ai-reasoning

Try Deep Think in the Gemini app

Google DeepMind Blog · 2025-10-23 Cached

Google is rolling out Deep Think, a new reasoning capability in the Gemini app for Google AI Ultra subscribers, featuring parallel thinking techniques and achieving bronze-level performance on the 2025 IMO benchmark. The full gold-medal version is being shared with select mathematicians for research purposes.

0 favorites 0 likes
← Back to home

Submit Feedback