off-policy-prediction

#off-policy-prediction

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

arXiv cs.AI ↗ · 5d ago Cached

This paper proposes STHTD-MP, a behavior-induced Mirror-Prox temporal-difference method for faster off-policy prediction in reinforcement learning. It replaces the covariance metric with the behavior-policy Bellman matrix and provides convergence analysis and experimental comparisons.

0 favorites 0 likes

off-policy-prediction

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

Submit Feedback