Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Sampled Muzero #216

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JosephDenman
Copy link

@JosephDenman JosephDenman commented Dec 18, 2022

This is a work-in-progress implementation of Sampled Muzero that I've been working on. I figured I'd store the implementation here in case anyone else is interested in developing it. The agent learns slowly (if at all) and performs significantly worse than the vanilla MuZero agent. As I'm relatively new to these libraries, I'm out of ideas for how to debug it. One interesting discrepancy I noticed between the regular agent and the sampled agent is that the policy loss for the sampled agent initially spikes, then returns to zero, then proceeds in a logarithmic curve, whereas the regular agent's policy loss has no such initial spike. But, I don't know how to interpret this difference. It also seems that, since the policy loss does converge, that automatic differentiation is configured correctly. In that case, the question would be why the policy does not improve more than it does, which suggests that something in the tree search is misconfigured.

I'm very much interested in feedback! Thanks.

@JosephDenman
Copy link
Author

Got it working. My problem was that the action passed to step in the environment was a tensor [i], rather than an integer i. I just had to get the integer from the tensor. Strange that the environment doesn't throw an error if the actions are incorrectly formatted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant