2020-04-13: Leverage the Average: an Analysis of Regularization in RL.

Introduction

By formalising regularized Markov decision processes,the effect of Kullback-Leibler (KL) and entropy regularization in reinforcement learning has been studied.

Through an equivalent formulation of the related approximate dynamic programming (ADP) scheme, we show that a KL penalty amounts to averaging q-values.

This equivalence allows drawing connections between a priori disconnected methods from the literature, and proving that a KL regularization indeed leads to averaging errors made at each iteration of value function update.

With the proposed theoretical analysis, we also study the interplay between KL and entropy regularization.

When the considered ADP scheme is combined with neural-network-based stochastic approximations, the equivalence is lost, which suggests a number of different ways to do regularization.

Back

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2020-04-13.md

2020-04-13.md

2020-04-13: Leverage the Average: an Analysis of Regularization in RL.

Introduction

Files

2020-04-13.md

Latest commit

History

2020-04-13.md

File metadata and controls

2020-04-13: Leverage the Average: an Analysis of Regularization in RL.

Introduction