Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv5 and ScoreCAM #242

Open
Harry-Rogers opened this issue Apr 29, 2022 · 10 comments
Open

YOLOv5 and ScoreCAM #242

Harry-Rogers opened this issue Apr 29, 2022 · 10 comments

Comments

@Harry-Rogers
Copy link

Harry-Rogers commented Apr 29, 2022

Hi I have pretrained a YOLOv5 model on a custom dataset and have tried to use the tutorial code to use ScoreCAM but seem to get the below error.

ValueError: only one element tensors can be converted to Python scalars

Which points to line 59 in score_cam.py (below).

outputs = [target(o).cpu().item() for o in self.model(batch)]

I'm unsure of how to fix this as the batch is a tensor that is the same shape as my other implementation using ScoreCAM with a Faster RCNN network.

Any help would be greatly appreciated.

@jacobgil
Copy link
Owner

Hi,
Need more details.
What is the target you're using ?
Is it exactly like FasterRCNNBoxScoreTarget from the notebook, or something else ?

@Harry-Rogers
Copy link
Author

I'm using the Yolov5 model so just I'm just using the below code from the tutorial. I managed to get ScoreCAM working for a Faster RCNN with the same dataset so I don't think its that.

target_layers = [model.model.model.model[-2]]

@jacobgil
Copy link
Owner

jacobgil commented May 5, 2022

Thanks, sorry for the delay in the response.
Are you using FasterRCNNBoxScoreTarget as the target (not target_layers)?
I suspect there is a problem there, so that's why I'm asking.
In case you modified the target (the function that outputs a score), can you please paste the code here?

@Harry-Rogers
Copy link
Author

Hi, I have been using a YOLOv5s model, I have adapted the YOLOv5 notebook below. I still get the same error mentioned above.

import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
import torch    
import cv2
import numpy as np
import requests
import torchvision.transforms as transforms
from pytorch_grad_cam import ScoreCAM
from pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image
from PIL import Image

COLORS = np.random.uniform(0, 255, size=(80, 3))

def parse_detections(results):
    detections = results.pandas().xyxy[0]
    detections = detections.to_dict()
    boxes, colors, names = [], [], []

    for i in range(len(detections["xmin"])):
        confidence = detections["confidence"][i]
        if confidence < 0.2:
            continue
        xmin = int(detections["xmin"][i])
        ymin = int(detections["ymin"][i])
        xmax = int(detections["xmax"][i])
        ymax = int(detections["ymax"][i])
        name = detections["name"][i]
        category = int(detections["class"][i])
        color = COLORS[category]

        boxes.append((xmin, ymin, xmax, ymax))
        colors.append(color)
        names.append(name)
    return boxes, colors, names


def draw_detections(boxes, colors, names, img):
    for box, color, name in zip(boxes, colors, names):
        xmin, ymin, xmax, ymax = box
        cv2.rectangle(
            img,
            (xmin, ymin),
            (xmax, ymax),
            color, 
            2)

        cv2.putText(img, name, (xmin, ymin - 5),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.8, color, 2,
                    lineType=cv2.LINE_AA)
    return img


image_url = "https://upload.wikimedia.org/wikipedia/commons/f/f1/Puppies_%284984818141%29.jpg"
img = np.array(Image.open("Puppies_(4984818141).jpg"))
img = cv2.resize(img, (640, 640))
rgb_img = img.copy()
img = np.float32(img) / 255
transform = transforms.ToTensor()
tensor = transform(img).unsqueeze(0)

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model.eval()
model.cpu()
target_layers = [model.model.model.model[-2]]

results = model([rgb_img])
boxes, colors, names = parse_detections(results)
detections = draw_detections(boxes, colors, names, rgb_img.copy())
Image.fromarray(detections)

cam = ScoreCAM(model, target_layers, use_cuda=False)
grayscale_cam = cam(tensor)[0, :, :]
cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)
Image.fromarray(cam_image)

@jacobgil
Copy link
Owner

jacobgil commented May 5, 2022

Oh ok, now I got it.

The example in the YOLO notebook uses EigenCAM, it's a method that doesn't require a "target".
The target is what guides the models selecting which channels are important by a score.
In the FasterRCNN notebook there a target function, for AblationCAM, that checks how the predicted box in the modified image overlap in IOU/category with the original boxes.
EigenCAM doesn't need this, but the ScoreCAM method does.

So will need to rewrite FasterRCNNBoxScoreTarget for YOLO (since the model outputs the boxes in a different format).

@jacobgil
Copy link
Owner

jacobgil commented May 5, 2022

I can try doing that

@Harry-Rogers
Copy link
Author

Oh ok thank you for clearing that up.

If that's possible that would be great.

@beneon
Copy link

beneon commented May 22, 2022

First I want to thank jacobgil for your brilliant works, especially for those tutorials, they are very helpful, even more useful than tutorials provided in captum.ai.

Anyway, I've been trying to get pytorch-grad-cam to output cam image for specific labels and wrote ScoreTarget class for yolo. I try to get ablationCam working for yolov5, but after some tinkering, things got stuck.

My understanding is that AblationCam replace the target layer I provided (like target_layers = [model.model.model.model[-2]]) with the albation layer. but after this, yolo v5 reported this error:

AttributeError: 'AblationLayerYolo' object has no attribute 'f'

So my question is do I need to implement this f thing myself? cause from what I saw, ablation layer should have .set_next_batch and call, and this f thing seem to be something native to yolov5, but since the layer replacement occurs, I also need to address it.

By the way, maybe score-cam can be adopted for yolo-v5 more easily? cause from what I see there is no layer replacement there.

@noreenanwar
Copy link

I can try doing that

did u able to implement that?

@bryanbocao
Copy link

Similar error here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants