-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset PPO policy.log_std
when loading previously saved model
#155
Comments
@araffin @kyrwilliams |
Thanks @Miffyli! So, a couple things: (1) I attempted the pytorch save/load methods, manually reseting the model.policy.log_std = th.nn.Parameter(th.tensor([-0.5, -0.5, -0.5], device='cuda:0', requires_grad=True))
model.learn(total_timesteps=500000, tb_log_name='ppo') But unfortunately this just locked (2) I found the following crude method DID work, since it uses the PPO class's model = PPO.load(load_path="log_dir\ppo_1\ppo_model", env=env2) # load saved model
model.policy.log_std=th.nn.Parameter(th.tensor([-0.5, -0.5, -0.5], device='cuda:0', requires_grad=True)) # reset log_std
model.save("ppo_model_temp") # save this adjusted model
model = PPO.load(load_path="ppo_model_temp", env=env2) # load the adjusted model
model.learn(total_timesteps=500000, tb_log_name='ppo') # learn This approach successfully reset the |
Hello,
We do that because you really need to know what is happening when you change those arguments between saving and loading.
You may need to register that parameter too and also check if it is present in the optimizer (which I assume is not the case given the result). |
Hmm ok. I think we could remove the parameter all-together in that case. I do not see why you would want to provide same parameters again which are already stored, and the only other option is to provide |
? |
Wouldn't that information (the custom policy pickled) be stored in the saved model as well? Or does it skip saving |
Yes, it is. However, if you want to continue training (with the zoo for instance), and you tried multiple configurations, checking the kwargs allow you to know if the saved model has the network architecture that you expect. |
When performing curriculum learning, being able to reset the ppo
policy.log_std
between training cycles would be nice. The following code will produce an error:Describe the bug
ValueError: The specified policy kwargs do not equal the stored policy kwargs.Stored kwargs
This is error is thrown because
log_std_init
differs between the two training cycles.The text was updated successfully, but these errors were encountered: