Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optim.LoadStateDict from existing StateDict doesn't clone tensors #1172

Closed
shaltielshmid opened this issue Dec 6, 2023 · 1 comment · Fixed by #1173
Closed

optim.LoadStateDict from existing StateDict doesn't clone tensors #1172

shaltielshmid opened this issue Dec 6, 2023 · 1 comment · Fixed by #1173

Comments

@shaltielshmid
Copy link
Contributor

shaltielshmid commented Dec 6, 2023

When I call optim.LoadStateDict from a state dictionary of an exiting optimizer, the tensors are copied by reference, so if the old optimizer gets disposed the tensors are invalid.

Sample code:

var lin1 = torch.nn.Linear(10, 10);

var optim1 = torch.optim.Adam(lin1.parameters());
var optim2 = torch.optim.Adam(lin1.parameters());
optim2.load_state_dict(optim1.state_dict());
optim1.Dispose();

torch.nn.functional.mse_loss(lin1.call(torch.rand(10)), torch.rand(10)).backward();
optim2.step();

Throws:

System.InvalidOperationException: 'Tensor invalid -- empty handle.'
@shaltielshmid
Copy link
Contributor Author

This also causes issues with devices, if you copy a state dict from an optimizer that was on a different device.

For example:

var lin1 = torch.nn.Linear(10, 10);
var optim1 = torch.optim.Adam(lin1.parameters());
var sd = optim1.state_dict();

lin1.cuda();
var optim2 = torch.optim.Adam(lin1.parameters());
Console.WriteLine((optim2.state_dict().State[0] as Adam.State).exp_avg.device.type); // CUDA
optim2.load_state_dict(sd);
Console.WriteLine((optim2.state_dict().State[0] as Adam.State).exp_avg.device.type); // CPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant