mathematical-reasoning

Tag

Cards List
#mathematical-reasoning

YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

arXiv cs.CL · 21h ago Cached

This paper introduces YFPO, a neuron-guided preference optimization framework that uses internal activation signals to improve mathematical reasoning in large language models.

0 favorites 0 likes
#mathematical-reasoning

Teaching Language Models to Think in Code

arXiv cs.CL · 2d ago Cached

This paper introduces ThinC (Thinking in Code), a framework where language models use code blocks exclusively for reasoning after a brief natural language planning step, outperforming existing tool-integrated reasoning baselines on math benchmarks.

0 favorites 0 likes
#mathematical-reasoning

Structural Rationale Distillation via Reasoning Space Compression

arXiv cs.CL · 2d ago Cached

This paper proposes D-RPC, a method for distilling reasoning from large language models to smaller ones by compressing reasoning paths into a reusable bank, achieving better performance and consistency on math and commonsense benchmarks.

0 favorites 0 likes
#mathematical-reasoning

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

Hugging Face Daily Papers · 3d ago Cached

This paper presents a comprehensive empirical study on on-policy distillation for large language models, identifying failure mechanisms like distribution mismatch and optimization instability, and proposing fixes such as stop-gradient objectives and RLVR-adapted teachers.

0 favorites 0 likes
#mathematical-reasoning

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

Hugging Face Daily Papers · 4d ago Cached

The paper proposes Crosslingual On-Policy Self-Distillation (COPSD), a method to transfer high-resource language reasoning capabilities to low-resource languages using a shared student-teacher architecture. Experiments across 17 African languages show significant improvements in mathematical reasoning and answer-format adherence, outperforming Group Relative Policy Optimization (GRPO).

0 favorites 0 likes
#mathematical-reasoning

Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference

Hugging Face Daily Papers · 4d ago Cached

DyStruct is a training-free Bayesian decoding framework for discrete Diffusion Language Models that enables flexible-length generation by dynamically determining expansion size and decoding order, improving accuracy on math and code tasks.

0 favorites 0 likes
#mathematical-reasoning

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Hugging Face Daily Papers · 5d ago Cached

Soohak is a new benchmark of 439 research-level math problems curated by mathematicians to evaluate the reasoning capabilities of frontier LLMs, highlighting significant gaps in solving advanced problems and recognizing ill-posed questions.

0 favorites 0 likes
#mathematical-reasoning

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone

MIT News — Artificial Intelligence · 2026-04-24 Cached

MIT researchers, in collaboration with KAUST and HUMAIN, have released MathNet, the largest open-source dataset of Olympiad-level math problems, containing over 30,000 expert-authored problems from 47 countries.

0 favorites 0 likes
#mathematical-reasoning

Less Is More: Cognitive Load and the Single-Prompt Ceiling in LLM Mathematical Reasoning

arXiv cs.CL · 2026-04-22 Cached

Empirical study on LLM formal-math reasoning finds a single-prompt ceiling: accuracy plateaus around 60–79% regardless of prompt size, driven by undecidability, model fragility, and distribution mismatch.

0 favorites 0 likes
#mathematical-reasoning

Measuring Representation Robustness in Large Language Models for Geometry

arXiv cs.CL · 2026-04-21 Cached

Researchers introduce GeoRepEval, a framework to evaluate LLM robustness across equivalent geometric problem representations (Euclidean, coordinate, vector). Testing 11 LLMs on 158 geometry problems, they find accuracy gaps up to 14 percentage points based solely on representation choice, with vector formulations being a consistent failure point.

0 favorites 0 likes
#mathematical-reasoning

Dynamic Sampling that Adapts: Self-Aware Iterative Data Persistent Optimization for Mathematical Reasoning

arXiv cs.CL · 2026-04-20 Cached

SAI-DPO introduces a dynamic sampling framework that adapts training data to a model's evolving capabilities during mathematical reasoning tasks, using self-aware difficulty metrics and knowledge semantic alignment to achieve state-of-the-art efficiency with less data on benchmarks like AIME24 and AMC23.

0 favorites 0 likes
#mathematical-reasoning

Large Language Models for Math Education in Low-Resource Languages: A Study in Sinhala and Tamil

arXiv cs.CL · 2026-04-20 Cached

This paper evaluates the mathematical reasoning capabilities of large language models in Sinhala and Tamil, two low-resource South Asian languages, using a parallel dataset of independently authored problems. The study demonstrates that while basic arithmetic transfers well across languages, complex reasoning tasks show significant performance degradation in non-English languages, with implications for deploying AI tutoring tools in multilingual educational contexts.

0 favorites 0 likes
#mathematical-reasoning

Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards

arXiv cs.CL · 2026-04-20 Cached

This paper identifies and addresses the problem of 'Miracle Steps' in LLM mathematical reasoning—unjustified jumps to correct answers that indicate reward hacking—by proposing Rubric Reward Model (RRM), a process-oriented reward function that evaluates entire reasoning trajectories. RRM achieves significant improvements on AIME2024 (26.7% to 62.6% Verified Pass@1024) and reduces Miracle Steps by 71%.

0 favorites 0 likes
#mathematical-reasoning

Learning to Reason with Insight for Informal Theorem Proving

arXiv cs.CL · 2026-04-20 Cached

This paper proposes DeepInsightTheorem, a hierarchical dataset and Progressive Multi-Stage SFT training strategy to improve LLMs' informal theorem proving by teaching them to identify and apply core techniques through insight-aware reasoning.

0 favorites 0 likes
#mathematical-reasoning

Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms

arXiv cs.CL · 2026-04-20 Cached

This paper investigates how large language models perform arithmetic operations by analyzing internal mechanisms through early decoding, revealing that proficient models exhibit a clear division of labor between attention and MLP modules in reasoning tasks.

0 favorites 0 likes
#mathematical-reasoning

Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

Hugging Face Daily Papers · 2026-04-20 Cached

STRATAGEM is a new framework for improving reasoning transferability in language models by using game self-play with a Reasoning Transferability Coefficient and Reasoning Evolution Reward to reinforce abstract, domain-agnostic reasoning patterns over game-specific heuristics. Experiments show strong improvements on mathematical reasoning, general reasoning, and code generation benchmarks.

0 favorites 0 likes
#mathematical-reasoning

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Hugging Face Daily Papers · 2026-04-20 Cached

MathNet is a large-scale multilingual multimodal benchmark of 30,676 Olympiad-level math problems spanning 47 countries and 17 languages, designed to evaluate mathematical reasoning and retrieval in generative and embedding-based models. Even state-of-the-art models like Gemini and GPT-5 struggle with the benchmark, highlighting significant room for improvement in mathematical AI.

0 favorites 0 likes
#mathematical-reasoning

DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off

Hugging Face Daily Papers · 2026-04-15 Cached

DiPO introduces a novel reinforcement learning approach for LLMs that uses perplexity-based sample partitioning to disentangle exploration and exploitation subspaces, combined with a bidirectional reward allocation mechanism for more stable policy optimization. The method demonstrates superior performance on mathematical reasoning and function calling tasks.

0 favorites 0 likes
#mathematical-reasoning

Evaluating AI’s ability to perform scientific research tasks

OpenAI Blog · 2025-12-16 Cached

OpenAI introduces FrontierScience, a new benchmark for measuring expert-level AI scientific capabilities across physics, chemistry, and biology, with GPT-5.2 achieving 77% on olympiad-style tasks and 25% on research-style tasks. The paper presents early evidence that GPT-5 meaningfully accelerates real scientific workflows, shortening work from weeks to hours while establishing metrics for tracking progress toward AI-accelerated science.

0 favorites 0 likes
#mathematical-reasoning

Advanced Gemini with Deep Think Achieves Gold Medal Standard at International Mathematical Olympiad

Google DeepMind Blog · 2025-10-24 Cached

Google DeepMind's advanced Gemini with Deep Think achieved gold-medal standard at the International Mathematical Olympiad 2025, solving 5 out of 6 problems for 35 points—a significant advance over last year's silver-medal performance, operating end-to-end in natural language within competition time limits.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback