mathematical-reasoning

#mathematical-reasoning

YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

arXiv cs.CL ↗ · 21h ago Cached

This paper introduces YFPO, a neuron-guided preference optimization framework that uses internal activation signals to improve mathematical reasoning in large language models.

0 favorites 0 likes

#mathematical-reasoning

Teaching Language Models to Think in Code

arXiv cs.CL ↗ · 2d ago Cached

This paper introduces ThinC (Thinking in Code), a framework where language models use code blocks exclusively for reasoning after a brief natural language planning step, outperforming existing tool-integrated reasoning baselines on math benchmarks.

0 favorites 0 likes

#mathematical-reasoning

Structural Rationale Distillation via Reasoning Space Compression

arXiv cs.CL ↗ · 2d ago Cached

This paper proposes D-RPC, a method for distilling reasoning from large language models to smaller ones by compressing reasoning paths into a reusable bank, achieving better performance and consistency on math and commonsense benchmarks.

0 favorites 0 likes

#mathematical-reasoning

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

Hugging Face Daily Papers ↗ · 3d ago Cached

This paper presents a comprehensive empirical study on on-policy distillation for large language models, identifying failure mechanisms like distribution mismatch and optimization instability, and proposing fixes such as stop-gradient objectives and RLVR-adapted teachers.

0 favorites 0 likes

#mathematical-reasoning

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

Hugging Face Daily Papers ↗ · 4d ago Cached

The paper proposes Crosslingual On-Policy Self-Distillation (COPSD), a method to transfer high-resource language reasoning capabilities to low-resource languages using a shared student-teacher architecture. Experiments across 17 African languages show significant improvements in mathematical reasoning and answer-format adherence, outperforming Group Relative Policy Optimization (GRPO).

0 favorites 0 likes

#mathematical-reasoning

Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference

Hugging Face Daily Papers ↗ · 4d ago Cached

DyStruct is a training-free Bayesian decoding framework for discrete Diffusion Language Models that enables flexible-length generation by dynamically determining expansion size and decoding order, improving accuracy on math and code tasks.

0 favorites 0 likes

#mathematical-reasoning

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Hugging Face Daily Papers ↗ · 5d ago Cached

Soohak is a new benchmark of 439 research-level math problems curated by mathematicians to evaluate the reasoning capabilities of frontier LLMs, highlighting significant gaps in solving advanced problems and recognizing ill-posed questions.

0 favorites 0 likes

#mathematical-reasoning

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone

MIT News — Artificial Intelligence ↗ · 2026-04-24 Cached

MIT researchers, in collaboration with KAUST and HUMAIN, have released MathNet, the largest open-source dataset of Olympiad-level math problems, containing over 30,000 expert-authored problems from 47 countries.

0 favorites 0 likes

#mathematical-reasoning

Less Is More: Cognitive Load and the Single-Prompt Ceiling in LLM Mathematical Reasoning

arXiv cs.CL ↗ · 2026-04-22 Cached

Empirical study on LLM formal-math reasoning finds a single-prompt ceiling: accuracy plateaus around 60–79% regardless of prompt size, driven by undecidability, model fragility, and distribution mismatch.

0 favorites 0 likes

#mathematical-reasoning

Measuring Representation Robustness in Large Language Models for Geometry

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers introduce GeoRepEval, a framework to evaluate LLM robustness across equivalent geometric problem representations (Euclidean, coordinate, vector). Testing 11 LLMs on 158 geometry problems, they find accuracy gaps up to 14 percentage points based solely on representation choice, with vector formulations being a consistent failure point.

0 favorites 0 likes

#mathematical-reasoning