gradient-td

#gradient-td

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

arXiv cs.AI ↗ · 5d ago Cached

This paper proposes behavior-aware auxiliary corrections for off-policy temporal-difference prediction, introducing BA-TDC and BA-TDRC algorithms that replace the auxiliary covariance matrix with the behavior Bellman matrix to improve stability and convergence. Theoretical analysis and experiments on standard benchmarks validate the effectiveness of the proposed methods.

0 favorites 0 likes

#gradient-td

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

arXiv cs.AI ↗ · 5d ago Cached

This paper proposes STHTD-MP, a behavior-induced Mirror-Prox temporal-difference method for faster off-policy prediction in reinforcement learning. It replaces the covariance metric with the behavior-policy Bellman matrix and provides convergence analysis and experimental comparisons.

0 favorites 0 likes

gradient-td

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

Submit Feedback