math-reasoning

#math-reasoning

REAP-pruned Nemotron-3-Super (512 -> 256 experts) + GRPO fine-tune + FP8/AWQ. AIME 2026 90%+. Benchmark inside.

Reddit r/LocalLLaMA ↗ · 2026-04-22

Community release of REAP-pruned Nemotron-3-Super-120B to 64B, GRPO fine-tuned on math, quantized to AWQ/FP8, hitting 90%+ on AIME 2026 and runnable on a single H100/RTX PRO 6000.

1 favorites 1 likes

#math-reasoning

TabularMath: Understanding Math Reasoning over Tables with Large Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

TabularMath introduces a benchmark and AutoT2T framework for evaluating LLMs' mathematical reasoning over tabular data, revealing that table complexity, data quality, and modality significantly impact model performance. The study addresses a gap in LLM evaluation by systematically assessing robustness to incomplete or inconsistent table information in real-world scenarios.

0 favorites 0 likes

#math-reasoning

When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces Adaptive Tool Trust Calibration (ATTC), a framework that improves tool-integrated reasoning models by enabling them to adaptively decide when to trust or ignore tool results based on code confidence scores. The approach addresses the "Tool Ignored" problem where models incorrectly dismiss correct tool outputs, achieving 4.1-7.5% performance improvements across multiple models and datasets.

0 favorites 0 likes

#math-reasoning

Solving math word problems

OpenAI Blog ↗ · 2021-10-29 Cached

OpenAI trained a system using verifiers to solve grade school math word problems with 90% of child-level accuracy, nearly doubling fine-tuned GPT-3 performance. The approach addresses language models' weakness in multistep reasoning by training verifiers to evaluate candidate solutions and select the best one.

0 favorites 0 likes

math-reasoning

REAP-pruned Nemotron-3-Super (512 -> 256 experts) + GRPO fine-tune + FP8/AWQ. AIME 2026 90%+. Benchmark inside.

TabularMath: Understanding Math Reasoning over Tables with Large Language Models

When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning

Solving math word problems

Submit Feedback