Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IS weights for prioritized replay #29

Merged
merged 1 commit into from
Mar 28, 2020

Conversation

xuxiyang1993
Copy link
Contributor

Hey,

I implemented the IS weights based on my understanding. In replay_buffer.py, I calculated $N$ and $P(i)$ according to the total number of samples in buffer and game/position sampling probabilities (with parameter self.PER_beta). Then I normalize weights by 1/ maxi wi after getting the weight_batch. During training, the weights_batch is multiplied by the loss of each data sample.

@werner-duvaud
Copy link
Owner

Thank you very much.

It looks good.
I haven't had time to do a lot of tests yet, it doesn't seem to have much influence on the cartpole results but it has allowed us to reach a very good level on tic-tac-toe, convergence was more stable.

@werner-duvaud werner-duvaud merged commit a38e2e8 into werner-duvaud:prioritized_replay Mar 28, 2020
egafni pushed a commit to egafni/muzero-general that referenced this pull request Apr 15, 2021
EpicLiem pushed a commit to EpicLiem/muzero-general-chess-archive that referenced this pull request Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants