Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it expected that the training speed of torch.cuda.amp is better than apex.amp #1412

Closed
kehuanfeng opened this issue Jun 23, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@kehuanfeng
Copy link

The environment is as below,
torch: 1.11.0+cu113
apex: 0.1
cuda 11.3.109
mmdet: 2.24.1
mmcv-full: 1.5.1
model: configs/yolox/yolox_l_8x8_300e_coco.py

I tried to compare the training speed of torch.cuda.amp(autocast + gradscaler) and apex.amp, and found that the native torch.cuda.amp is faster.

# the number indicates the total training time for two epochs,
torch.cuda.amp: 2130 sec
apex.amp: 2309 sec

I'd like to understand whether it is expected and why it happens.

@kehuanfeng kehuanfeng added the bug Something isn't working label Jun 23, 2022
@kehuanfeng
Copy link
Author

kehuanfeng commented Jun 23, 2022

After breaking down the step time of torch.cuda.amp and apex.amp, it seems like apex requires more data copy.

@ptrblck
Copy link
Contributor

ptrblck commented Aug 3, 2022

apex.amp is deprecated and you should use the native implementation via torch.cuda.amp as described here.
Closing

@ptrblck ptrblck closed this as completed Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants