IS weights for prioritized replay #29

xuxiyang1993 · 2020-03-27T00:41:42Z

Hey,

I implemented the IS weights based on my understanding. In replay_buffer.py, I calculated $N$ and $P(i)$ according to the total number of samples in buffer and game/position sampling probabilities (with parameter self.PER_beta). Then I normalize weights by 1/ maxi wi after getting the weight_batch. During training, the weights_batch is multiplied by the loss of each data sample.

werner-duvaud · 2020-03-28T17:49:43Z

Thank you very much.

It looks good.
I haven't had time to do a lot of tests yet, it doesn't seem to have much influence on the cartpole results but it has allowed us to reach a very good level on tic-tac-toe, convergence was more stable.

IS weights for prioritized replay

IS weights for prioritized replay

f593f98

werner-duvaud merged commit a38e2e8 into werner-duvaud:prioritized_replay Mar 28, 2020

egafni pushed a commit to egafni/muzero-general that referenced this pull request Apr 15, 2021

Merge pull request werner-duvaud#29 from xuxiyang1993/master

4a20f90

IS weights for prioritized replay

EpicLiem pushed a commit to EpicLiem/muzero-general-chess-archive that referenced this pull request Feb 4, 2023

Merge pull request werner-duvaud#29 from xuxiyang1993/master

d41dad1

IS weights for prioritized replay

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IS weights for prioritized replay #29

IS weights for prioritized replay #29

xuxiyang1993 commented Mar 27, 2020

werner-duvaud commented Mar 28, 2020

IS weights for prioritized replay #29

IS weights for prioritized replay #29

Conversation

xuxiyang1993 commented Mar 27, 2020

werner-duvaud commented Mar 28, 2020