Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The model does not converge for breakout #211

Open
1 task done
yungangwu opened this issue Oct 20, 2022 · 13 comments
Open
1 task done

The model does not converge for breakout #211

yungangwu opened this issue Oct 20, 2022 · 13 comments
Labels
enhancement New feature or request

Comments

@yungangwu
Copy link

Search before asking

  • I have searched the MuZero issues and found no similar feature requests.

Description

I trained muzero for breakout with the hyperparameters given in the code, but up to 450,000 steps, its reward was still 0 and showed no convergence. So I would like to ask, are the hyperparameters in the code validated hyperparameters? Thank, you!

Additional context

No response

@yungangwu yungangwu added the enhancement New feature or request label Oct 20, 2022
@JohnPPP
Copy link

JohnPPP commented Oct 20, 2022 via email

@yungangwu
Copy link
Author

yungangwu commented Oct 20, 2022

Have you tried any other parameter Settings? For example, if batch_size is set to 1024, does the model converge under certain hyperparameter Settings? @JohnPPP

@yungangwu yungangwu reopened this Oct 20, 2022
@JohnPPP
Copy link

JohnPPP commented Oct 20, 2022 via email

@yungangwu
Copy link
Author

gg. I also met the same problem, did a lot of experiments, but nothing happened, I don't know if there is a mistake in the code. @JohnPPP

@JohnPPP
Copy link

JohnPPP commented Oct 20, 2022 via email

@dillonmsandhu
Copy link

Did the reward stay zero the entire time, or did it occasionally get some reward? I have it working on cartpole, but not on Atari. That said, it still gets a reward of 2 or 3 occasionally in breakout, indicating that it is behaving randomly.

@zsn2021
Copy link

zsn2021 commented Dec 31, 2022

I also encountered the same problem. I adjusted the super parameters for a long time, but I couldn't learn a good effect in my environment

@yungangwu
Copy link
Author

yungangwu commented Dec 31, 2022 via email

@zsn2021
Copy link

zsn2021 commented Dec 31, 2022

Is there a possibility that many networks need to be learned, leading to decision failure.
If you can, you can add a contact information and we can communicate privately

@yungangwu
Copy link
Author

yungangwu commented Dec 31, 2022 via email

@zsn2021
Copy link

zsn2021 commented Dec 31, 2022

您可以加我的微信联系方式
13162062294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants
@JohnPPP @dillonmsandhu @yungangwu @zsn2021 and others