Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression
Summary
This paper introduces a distribution-aware reinforcement learning framework that enhances MLLM performance in long-tailed numerical regression tasks using batch-level comparison-based supervision.
View Cached Full Text
Cached at: 05/12/26, 07:28 AM
Paper page - Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression
Source: https://huggingface.co/papers/2605.01402 Published on May 11
·
Submitted byhttps://huggingface.co/ChanganYao
DUYaoon May 12
Abstract
A distribution-aware reinforcement learning framework improves multimodal large language models’ numerical regression performance on long-tailed distributions through batch-level comparison-based supervision.
Multimodal large language models(MLLMs) struggle withnumerical regressionunder long-tailed target distributions. Token-levelsupervised fine-tuning(SFT) andpoint-wise regression rewardsbias learning toward high-density regions, leading to regression-to-the-mean behavior and poor tail performance. We identify the lack of cross-sample relational supervision as a key limitation of existing MLLM training paradigms. To address it, we propose a distribution-awarereinforcement learningframework based onGroup Relative Policy Optimization, which introducesbatch-level comparison-based supervision via theConcordance Correlation Coefficient-based reward to align predicted and ground-truth distributions in terms of correlation, scale, and mean. The framework is plug-and-play, requiring no architectural modification. Experiments on a unified suite of long-tailed regression benchmarks show consistent improvements over SFT and existing MLLM regression methods, with particularly strong gains in medium- and few-shot regimes.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.01402
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.01402 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.01402 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.01402 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation
This paper proposed Distribution-Aligned Adversarial Distillation (DisAAD), a method that uses a lightweight proxy model to estimate uncertainty in black-box LLMs with only 1% of the original model size, achieving reliable quantification without requiring internal parameters or multiple sampling.
Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs
This paper investigates whether reinforcement learning can improve the direct recall of parametric knowledge in LLMs beyond reasoning tasks. It demonstrates that RL with binary rewards yields significant gains in factual QA benchmarks by redistributing probability mass to unlock latent knowledge rather than acquiring new facts.
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
This paper introduces ResRL, a method to boost LLM reasoning by decoupling semantic distributions between positive and negative responses through negative sample projection. It aims to maintain generation diversity while improving performance on various benchmarks.
On Predicting the Post-training Potential of Pre-trained LLMs
This paper introduces RuDE, a framework for predicting the post-training potential of pre-trained LLMs by leveraging response discrimination, addressing the limitations of traditional benchmarks like MMLU.
BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
The paper introduces BalCapRL, a balanced reinforcement learning framework for multimodal large language models that jointly optimizes correctness, coverage, and linguistic quality in image captioning. It demonstrates improved performance over existing methods by addressing trade-offs between utility and fluency through reward decoupling and length-conditional masking.