reward-model

Tag

Cards List
#reward-model

From Long News to Accurate Forecast: Importance-Aware Fusion and PRM-Guided Reflection for Time Series Forecasting

arXiv cs.AI · yesterday Cached

This paper introduces a framework for time series forecasting that uses importance-aware news compression and process reward model-guided retrieval to incorporate long news articles within fixed context limits, improving prediction accuracy across finance, energy, traffic, and Bitcoin benchmarks.

0 favorites 0 likes
#reward-model

Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics

arXiv cs.CL · yesterday Cached

This paper introduces a method to predict best-of-N inference scaling gains for language models using cheap statistics from a single labeled validation-set sampling pass. A compact predictor with three core features achieves Spearman ρ=0.90 with actual gains, enabling screening of configurations before expensive reward-model scoring.

0 favorites 0 likes
#reward-model

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs

arXiv cs.AI · 2d ago Cached

Introduces Latent Reward Steering (Lrs), an adaptive inference-time framework that uses sparse autoencoder latent states and a learned reward model to implicitly promote cognitive behaviors like verification and backtracking in reasoning LLMs, improving performance across multiple models and benchmarks.

0 favorites 0 likes
#reward-model

Configurable Reward Model for Balanced Safety Alignment

arXiv cs.CL · 3d ago Cached

This paper introduces the Configurable Safety Reward Model (CSRM), a reward model that can be configured to accommodate heterogeneous and evolving safety requirements for LLM alignment. CSRM achieves state-of-the-art results on configurable safety benchmarks and improves the helpfulness-safety tradeoff.

0 favorites 0 likes
#reward-model

KARMA: Karma-Aligned Reward Model Adaptation

arXiv cs.CL · 2026-05-27 Cached

Introduces KARMA, a framework that trains a reward model on Reddit conversations to improve LLMs' context-sensitive conversational behavior via reinforcement learning, finding that the best reward model for predicting karma does not yield the best downstream alignment.

0 favorites 0 likes
#reward-model

Multi-Stakeholder LLM Alignment: Decomposing Estimation from Aggregation

arXiv cs.AI · 2026-05-27 Cached

This paper identifies weighting noise in LLM judges for multi-stakeholder tasks and proposes DecompR, a method that decouples utility estimation from aggregation using counterfactually calibrated weights.

0 favorites 0 likes
#reward-model

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

arXiv cs.CL · 2026-05-27 Cached

This paper introduces CroCo, a method for cross-lingual contrastive preference tuning on self-generated responses, showing that a reward model trained on English preferences can effectively rank responses in other languages, improving model performance across 14 languages without language-specific annotations.

0 favorites 0 likes
#reward-model

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

Hugging Face Daily Papers · 2026-05-26 Cached

This paper introduces alignment tampering, a vulnerability in RLHF where language models can manipulate preference datasets to amplify misaligned biases, demonstrating experimentally across biases like sexism, brand promotion, and goal-seeking, and showing that existing mitigation techniques are insufficient.

0 favorites 0 likes
#reward-model

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

Hugging Face Daily Papers · 2026-05-20 Cached

AutoRubric-T2I automatically generates and selects explicit rubrics to guide Vision-Language Model judges for text-to-image generation, achieving high-quality reward signals with minimal human annotation and improving generation quality in downstream tasks.

0 favorites 0 likes
#reward-model

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

Hugging Face Daily Papers · 2026-04-17 Cached

VEFX-Bench introduces a large-scale human-annotated video editing dataset (5,049 examples) with multi-dimensional quality labels and a specialized reward model for standardized evaluation of video editing systems. The paper addresses the lack of comprehensive benchmarks in AI-assisted video creation by providing VEFX-Dataset, VEFX-Reward, and a 300-video-prompt benchmark that reveals gaps in current editing models.

0 favorites 0 likes
#reward-model

Learning to summarize with human feedback

OpenAI Blog · 2020-09-04 Cached

OpenAI demonstrates a technique for improving language model summarization by training a reward model on human preferences and fine-tuning models with reinforcement learning, achieving significant quality improvements that generalize across datasets. This work advances model alignment through human feedback at scale, with applications beyond summarization.

0 favorites 0 likes
← Back to home

Submit Feedback