Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix YoloNAS on cuda #1444

Merged
merged 3 commits into from
Oct 12, 2023

Conversation

Louis-Dupont
Copy link
Contributor

Reproduce bug

import torch
from super_gradients.common.object_names import Models
from super_gradients.training import models

# Note that currently only YoloX, PPYoloE and YOLO-NAS are supported.
model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

# We want to use cuda if available to speed up inference.
model = model.to("cuda" if torch.cuda.is_available() else "cpu")

IMAGES = [
    "../../../../documentation/source/images/examples/countryside.jpg",
    "../../../../documentation/source/images/examples/street_busy.jpg",
    "https://cdn-attachments.timesofmalta.com/cc1eceadde40d2940bc5dd20692901371622153217-1301777007-4d978a6f-620x348.jpg",
]

predictions = model.predict(IMAGES)
predictions.show()
predictions.save(output_folder="")  # Save in working directory

Exception

    shift_x = torch.arange(end=w, dtype=dtype) + self.grid_cell_offset
RuntimeError: "arange_cpu" not implemented for 'Half'

pytorch == '1.12.0+cu102'

Solutions

  1. remove dtype
  2. check pytorch version and use dtype only if user has the right version. I am not sure what's the best way to find which version supports this, except by trying them one by one.

Should we go for 1. or do we still want to add dtype when possible ? What was the motivation to add it ? @BloodAxe

@Louis-Dupont Louis-Dupont marked this pull request as draft September 3, 2023 09:33
@Louis-Dupont Louis-Dupont changed the title fix Fix YoloNAS on cuda Sep 3, 2023
@BloodAxe
Copy link
Collaborator

BloodAxe commented Sep 3, 2023

Dtype is needed since otherwise it causes fp64 types appear in onnx graph.
But I'm not sure where fp16 issue is coming from. I will take a look, thanks for raising this issue

@Pbatch
Copy link

Pbatch commented Oct 4, 2023

Is it possible to add just the device to torch.arange instead?

I.e.

shift_x = torch.arange(end=w, device=device, dtype=dtype) + self.grid_cell_offset

Creating a torch.float16 tensor is fine if it's made on the GPU.

@BloodAxe BloodAxe marked this pull request as ready for review October 11, 2023 14:02
Copy link
Contributor Author

@Louis-Dupont Louis-Dupont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@BloodAxe BloodAxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BloodAxe BloodAxe merged commit ecdec5e into master Oct 12, 2023
7 checks passed
@BloodAxe BloodAxe deleted the hotfix/SG-000-fix_model_predict_on_cuda_due_to_dtype branch October 12, 2023 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants