Skip to content

Buffer clearing in policy gradient methods #18

Answered by iffiX
ilyalasy asked this question in Q&A
Discussion options

You must be logged in to vote

Because A2C and PPO are stochastic policies, the log probability of actions are no longer the same after you update actors once. So theoretically you cannot even update their actors multiple times in each update function (determined by actor_update_times in init call), but in practice update for a few times works and learns better so I kept that design.

DQN is not a stochastic algorithm, so it is not constrained by that.

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@iffiX
Comment options

@ilyalasy
Comment options

@iffiX
Comment options

@ilyalasy
Comment options

@ilyalasy
Comment options

Answer selected by iffiX
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants