Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

Hugging Face Daily Papers 05/11/26, 12:00 AM Papers

Summary

This paper introduces a distribution-aware reinforcement learning framework that enhances MLLM performance in long-tailed numerical regression tasks using batch-level comparison-based supervision.

Multimodal large language models (MLLMs) struggle with numerical regression under long-tailed target distributions. Token-level supervised fine-tuning (SFT) and point-wise regression rewards bias learning toward high-density regions, leading to regression-to-the-mean behavior and poor tail performance. We identify the lack of cross-sample relational supervision as a key limitation of existing MLLM training paradigms. To address it, we propose a distribution-aware reinforcement learning framework based on Group Relative Policy Optimization, which introduces batch-level comparison-based supervision via the Concordance Correlation Coefficient-based reward to align predicted and ground-truth distributions in terms of correlation, scale, and mean. The framework is plug-and-play, requiring no architectural modification. Experiments on a unified suite of long-tailed regression benchmarks show consistent improvements over SFT and existing MLLM regression methods, with particularly strong gains in medium- and few-shot regimes.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/12/26, 07:28 AM

Paper page - Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

Source: https://huggingface.co/papers/2605.01402 Published on May 11

Submitted byhttps://huggingface.co/ChanganYao

DUYaoon May 12

Abstract

A distribution-aware reinforcement learning framework improves multimodal large language models’ numerical regression performance on long-tailed distributions through batch-level comparison-based supervision.

Multimodal large language models(MLLMs) struggle withnumerical regressionunder long-tailed target distributions. Token-levelsupervised fine-tuning(SFT) andpoint-wise regression rewardsbias learning toward high-density regions, leading to regression-to-the-mean behavior and poor tail performance. We identify the lack of cross-sample relational supervision as a key limitation of existing MLLM training paradigms. To address it, we propose a distribution-awarereinforcement learningframework based onGroup Relative Policy Optimization, which introducesbatch-level comparison-based supervision via theConcordance Correlation Coefficient-based reward to align predicted and ground-truth distributions in terms of correlation, scale, and mean. The framework is plug-and-play, requiring no architectural modification. Experiments on a unified suite of long-tailed regression benchmarks show consistent improvements over SFT and existing MLLM regression methods, with particularly strong gains in medium- and few-shot regimes.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.01402

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.01402 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.01402 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.01402 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

Paper page - Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

On Predicting the Post-training Potential of Pre-trained LLMs

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

Submit Feedback

Similar Articles

Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

On Predicting the Post-training Potential of Pre-trained LLMs

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning