mathematical-reasoning

Tag

Cards List
#mathematical-reasoning

The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes

arXiv cs.CL · 2d ago Cached

A comprehensive survey analyzing over 300 papers on LLM reasoning, presenting a taxonomy of reasoning paradigms including Chain-of-Thought, Multi-Hop, Mathematical, Commonsense, and others, along with common failure modes and research gaps.

0 favorites 0 likes
#mathematical-reasoning

ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics

arXiv cs.AI · 3d ago Cached

ComBench is an Olympiad-level combinatorics benchmark with 100 problems designed to evaluate rigorous proof reasoning and constructive realization in large language models, revealing that frontier models like GPT-5.5 achieve only 65.4% overall average and that these two capabilities are distinct.

0 favorites 0 likes
#mathematical-reasoning

PADD: Path-Aligned Decompression Distillation for Non-Router Teacher to Guide MoE Student Learning

arXiv cs.CL · 3d ago Cached

Proposes PADD, a framework for distilling knowledge from dense teachers into mixture-of-experts (MoE) students, addressing the challenge of learning routing policies without a router in the teacher. The method involves four stages and shows improvements on mathematical reasoning benchmarks.

0 favorites 0 likes
#mathematical-reasoning

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

Hugging Face Daily Papers · 4d ago Cached

N-GRPO introduces semantic neighbor mixing in the GRPO framework to enhance mathematical reasoning diversity while preserving semantic consistency, achieving improvements on math benchmarks and out-of-distribution tasks.

0 favorites 0 likes
#mathematical-reasoning

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

arXiv cs.CL · 5d ago Cached

This paper introduces Prefix Utility Model (PUM), which evaluates LLM reasoning prefixes based on their utility (improvement in solve rate) rather than local correctness. PUM shows strong performance in mathematical reasoning tasks across selection, search, and reinforcement learning.

0 favorites 0 likes
#mathematical-reasoning

RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning

arXiv cs.LG · 5d ago Cached

RASFT is a novel supervised fine-tuning framework for large language models that adapts expert supervision based on the model's own reasoning capabilities, achieving better performance on mathematical and code reasoning benchmarks compared to standard SFT and reinforcement learning methods.

0 favorites 0 likes
#mathematical-reasoning

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

arXiv cs.LG · 5d ago Cached

This paper benchmarks sub-1B models on mathematical reasoning tasks, revealing that full fine-tuning actively harms performance in models under 300M parameters, while parameter-efficient fine-tuning (PEFT) like LoRA and DoRA provides stability. The authors recommend defaulting to PEFT for all aligned sub-1B models and caution against full FT for architectures smaller than 500M to prevent catastrophic forgetting.

0 favorites 0 likes
#mathematical-reasoning

CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

arXiv cs.AI · 5d ago Cached

Introduces CrowdMath, a dataset of 164 expert-annotated progress chains from the MIT PRIMES–AoPS CrowdMath program, capturing collaborative mathematical problem-solving. Benchmarks six frontier models, finding they achieve 83-88% accuracy on next-post prediction but only 0.42 macro-F1 on post-role classification, highlighting a gap in understanding collaborative progress.

0 favorites 0 likes
#mathematical-reasoning

SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling

Hugging Face Daily Papers · 5d ago Cached

Sign-Gated On-Policy Distillation (SG-OPD) enhances standard on-policy distillation by using a binary verifier as a trust signal for teacher supervision, improving performance on competition-level math reasoning benchmarks.

0 favorites 0 likes
#mathematical-reasoning

Cross-Epoch Adaptive Rollout Optimization for RL Post-Training

arXiv cs.LG · 2026-06-05 Cached

This paper presents CERO, a cross-epoch adaptive rollout optimization method for RL post-training of LLMs, which allocates a fixed rollout budget across prompts and epochs using Bayesian posterior variance to maximize sample efficiency, achieving theoretical regret bounds and outperforming GRPO on mathematical reasoning tasks.

0 favorites 0 likes
#mathematical-reasoning

Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs

arXiv cs.CL · 2026-06-04 Cached

Deliberate Evolution (DE) is an agentic framework that improves LLM-based symbolic regression by decoupling candidate generation from search control, using adaptive operators, structural diagnosis tools, and reflective memory to achieve better results with only 40% of the standard sample budget.

0 favorites 0 likes
#mathematical-reasoning

BiNSGPS: Geometry Problem Solving via Bidirectional Neuro-Symbolic Interaction

arXiv cs.AI · 2026-06-04 Cached

BiNSGPS is a framework that introduces bidirectional interaction between a multimodal LLM adviser and a symbolic solver for geometry problem solving, allowing feedback from the solver to correct errors and generate auxiliary hypotheses. It achieves state-of-the-art performance of 90.5% on Geometry3K and 90.1% on PGPS9K benchmarks.

0 favorites 0 likes
#mathematical-reasoning

Characterizing initial human-AI proof formalization workflows

arXiv cs.AI · 2026-06-04 Cached

Researchers from Oxford, Cambridge, MIT, CMU and other institutions conduct a mixed-methods study examining how people integrate AI tools into mathematical proof formalization workflows, finding that participants generally achieve higher formalization accuracy with AI assistance while preferring to maintain high-level human control over the proof discovery process.

0 favorites 0 likes
#mathematical-reasoning

GTBench: A Curriculum-Grounded Benchmark for Evaluating LLMs as Mathematical Research Assistants in Graph Theory

arXiv cs.AI · 2026-06-03 Cached

The paper introduces GTBench, a curriculum-grounded benchmark for evaluating LLMs as mathematical research assistants in graph theory, containing 63 problems across three difficulty levels. It evaluates five frontier models and finds that performance degrades with difficulty, with GPT-5 achieving near-perfect results on basic problems but only 82% on graduate-level proofs.

0 favorites 0 likes
#mathematical-reasoning

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

arXiv cs.AI · 2026-06-03 Cached

EvoTrainer introduces an autonomous training framework that co-evolves LLM policies and training harnesses through empirical feedback, outperforming human-engineered RL baselines on mathematical reasoning, code generation, and long-horizon software engineering tasks.

0 favorites 0 likes
#mathematical-reasoning

GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards

Hugging Face Daily Papers · 2026-06-03 Cached

GRAIL introduces gradient-reweighted advantages to improve token-level credit assignment in reinforcement learning for LLM reasoning, outperforming GRPO across multiple models.

0 favorites 0 likes
#mathematical-reasoning

AXIOM: A Trust-First Neuro-Symbolic Execution Architecture for Verifiable Mathematical Reasoning

arXiv cs.AI · 2026-06-02 Cached

AXIOM is a trust-first neuro-symbolic execution architecture for mathematical reasoning where the LLM acts as a canonicalizer, rewriting natural language problems into schemas processed by a deterministic CAS pipeline, achieving 94.36% correctness with 100% trust on parseable queries.

0 favorites 0 likes
#mathematical-reasoning

KACE: Knowledge-Adaptive Context Engineering for Mathematical Reasoning

arXiv cs.AI · 2026-06-02 Cached

KACE introduces a knowledge-adaptive context engineering method that separates storage from usage via an epistemic tree and tiered self-consistency, achieving 62.2% on AIME 2025—a 10.4-point gain over fixed self-consistency.

0 favorites 0 likes
#mathematical-reasoning

CAST: Non-Privileged Clipped Asymmetric Self-Teaching with Advantage Flipping for GRPO

arXiv cs.AI · 2026-06-02 Cached

This paper proposes CAST, a non-privileged clipped asymmetric self-teaching method that enhances GRPO-based reinforcement learning with verifiable rewards by providing dense token-level guidance and addressing zero-variance group issues, demonstrating improvements in mathematical reasoning.

0 favorites 0 likes
#mathematical-reasoning

Universal Quantum Transformer

arXiv cs.AI · 2026-06-02 Cached

This paper introduces the Universal Quantum Transformer (UQT), a quantum-native architecture that uses multi-qubit systems for exact mathematical reasoning, achieving deterministic generalization on modular arithmetic and permutation groups while bypassing classical over-parameterization and quadratic attention bottlenecks, with deployment on IBM Quantum hardware.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback