robust-optimization

Tag

Cards List
#robust-optimization

ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization

arXiv cs.LG · yesterday Cached

Introduces ODRPO, a framework that decomposes discrete rewards into ordinal binary indicators to improve robustness of policy optimization in RLAIF for LLMs, achieving up to 14.8% relative improvement with minimal overhead.

0 favorites 0 likes
#robust-optimization

Quantile Geometry Regularization for Distributional Reinforcement Learning

arXiv cs.LG · 3d ago Cached

This paper introduces RQIQN, a robust quantile-based method for distributional reinforcement learning that uses Wasserstein geometry regularization to prevent distribution degeneration and improve performance in risk-sensitive tasks.

0 favorites 0 likes
← Back to home

Submit Feedback