off-policy-prediction

Tag

Cards List
#off-policy-prediction

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

arXiv cs.AI · 5d ago Cached

This paper proposes STHTD-MP, a behavior-induced Mirror-Prox temporal-difference method for faster off-policy prediction in reinforcement learning. It replaces the covariance metric with the behavior-policy Bellman matrix and provides convergence analysis and experimental comparisons.

0 favorites 0 likes
← Back to home

Submit Feedback