Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of Pytorch Native AMP in place of apex (Pytorch 1.6) in Trainer #6115

Closed
prajjwal1 opened this issue Jul 29, 2020 · 3 comments · Fixed by #6151
Closed

Usage of Pytorch Native AMP in place of apex (Pytorch 1.6) in Trainer #6115

prajjwal1 opened this issue Jul 29, 2020 · 3 comments · Fixed by #6151

Comments

@prajjwal1
Copy link
Contributor

prajjwal1 commented Jul 29, 2020

🚀 Feature request

It would be nice to remove the Apex dependency for fp16 training and use the native Pytorch's AMP methods in the Trainer method. Pytorch recommends Apex users to switch to it's native implementation, even Apex does it so. Moreover, it will eliminate the need for users to build apex by themselves.

Your contribution

I am happy to submit a PR if you think it would be a good addition. Please let me know.

@prajjwal1
Copy link
Contributor Author

prajjwal1 commented Jul 29, 2020

I had a query, Pytorch examples show loss being calculated as :

with autocast():
    output = model(input)
    loss = loss_fn(output, target)
scaler.scale(loss).backward()

But in all SequenceClassification and other models, loss is calculated in the forward pass. We can use @autocast decorator on the forward pass as the docs suggest, but this introduce so many changes for one feature. Maybe, there's a workaround. Does computing loss in autocast scope affect the loss itself when backward is called upon it ?

@sgugger
Copy link
Collaborator

sgugger commented Jul 29, 2020

Hi there,

Note that we won't pin the version of PyTorch to 1.6 minimum, so the use of native mixed precision will have to be controlled by a test on the pytorch version (basically use native mixed precision when the version allows it and use apex otherwise).

Otherwise, I don't think the loss being computed inside the model should be a problem, the line would probably be

with autocast():
    outputs = model(**inputs)
    loss = outputs[0]

inside Trainer but I haven't run tests yet.

You're welcome to try to work on a PR with this, otherwise it is on my TODO for when I have time (hopefully in the next few weeks).

@prajjwal1
Copy link
Contributor Author

Hi Sylvain,

I've opened up a PR. I know that pinning of version won't be done. To address, I've addressed this issue the same way as we handle scheduler.get_lr().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants