Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix different devices bug when moving model from GPU to CPU #5110

Merged
merged 6 commits into from
Oct 10, 2021
Merged

Fix different devices bug when moving model from GPU to CPU #5110

merged 6 commits into from
Oct 10, 2021

Conversation

jebastin-nadar
Copy link
Contributor

@jebastin-nadar jebastin-nadar commented Oct 9, 2021

Discovered by @glenn-jocher #4833 (comment)

Simple reproducer:

import torch

img = 'https://ultralytics.com/images/zidane.jpg'
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', device='cuda:0')
results = model(img)
results.print()

model.to('cpu')
results = model(img)
results.print()
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and CPU!

Problem :

  • In Detect() module, stride and grid are not moved to the correct device. Only tensors that are registered using register_buffer move to the correct device.

Possible solutions :

  • Declare stride and grid as a nn.Parameter() and nn.ParameterList() respectively.
  • Save stride and grid using register_buffer(). But grid is a list of tensors and register_buffer() only accepts tensors as inputs, so not an ideal solution.
  • Override the to() function of Model and AutoShape class to manually move stride and grid. This seems to be an easy and straightforward solution.

Similar discussion - https://discuss.pytorch.org/t/why-model-to-device-wouldnt-put-tensors-on-a-custom-layer-to-the-same-device/17964

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced YOLOv5 model flexibility with improved tensor operation support.

📊 Key Changes

  • Added _apply method in models/common.py and models/yolo.py.
  • The method accommodates model tensor operations such as .to(), .cpu(), .cuda(), and .half().
  • Adjustments made specifically to the last layer Detect() within the model to correctly handle stride and grid.

🎯 Purpose & Impact

  • These changes ensure that when you move the model between devices and data types (e.g., CPU to GPU, or full precision to half precision), it carries over seamlessly.
  • This functionality is crucial for deployment flexibility, allowing users to easily optimize models for different hardware environments 🔄.
  • End-users might experience more stable performance and a smoother experience when switching their models across different computational setups 🖥️⚡.
  • Developers gain a simplified codebase for managing device and data type transitions, making future maintenance and upgrades smoother 💻🔧.

@glenn-jocher
Copy link
Member

@SamFC10 thanks for the PR! Yes this is probably the best solution of the 3 (thanks for the detailed analysis of options).

Does this PR pass your reproduce test?

@jebastin-nadar
Copy link
Contributor Author

Does this PR pass your reproduce test?

Yes it does. To verify :

- model = torch.hub.load('ultralytics/yolov5', 'yolov5s', device='cuda:0')
+ model = torch.hub.load('SamFC10/yolov5:fix-devices', 'yolov5s', device='cuda:0')

But there is still an issue. While model.to('cpu') works, model.cpu() does not and still returns the same error.

Looking at the source code, functions like to(), cpu(), cuda(), half(), etc call _apply() function which modifies the parameters and registered buffers. It makes sense to extend _apply() function to modify stride and grid instead of extending all these functions individually.

@jebastin-nadar
Copy link
Contributor Author

Latest commit fixes the errors. model.to(device), model.cpu(), model.cuda() all run without any errors

@glenn-jocher
Copy link
Member

@

Does this PR pass your reproduce test?

Yes it does. To verify :

- model = torch.hub.load('ultralytics/yolov5', 'yolov5s', device='cuda:0')
+ model = torch.hub.load('SamFC10/yolov5:fix-devices', 'yolov5s', device='cuda:0')

But there is still an issue. While model.to('cpu') works, model.cpu() does not and still returns the same error.

Looking at the source code, functions like to(), cpu(), cuda(), half(), etc call _apply() function which modifies the parameters and registered buffers. It makes sense to extend _apply() function to modify stride and grid instead of extending all these functions individually.

Very smart, this is a good idea. Good use of map() also. This is a great PR overall!

@glenn-jocher glenn-jocher changed the title fix different devices bug when moving model from GPU to CPU Fix different devices bug when moving model from GPU to CPU Oct 10, 2021
@glenn-jocher glenn-jocher merged commit a0e1504 into ultralytics:master Oct 10, 2021
@glenn-jocher
Copy link
Member

@SamFC10 PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

@glenn-jocher glenn-jocher linked an issue Oct 11, 2021 that may be closed by this pull request
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
…ics#5110)

* fix different devices bug

* extend _apply() instead of to() for a general fix

* Only apply if Detect() is last layer

Co-authored-by: Jebastin Nadar <njebastin10@gmail.com>

* Indent fix

* Add comment to yolo.py

* Add comment to common.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ONNX export with GPU (--device opt) is not working
2 participants