Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call of model.render() modify the predictions #11810

Closed
1 task done
TimotheeWrightFicha opened this issue Jul 4, 2023 · 9 comments
Closed
1 task done

Call of model.render() modify the predictions #11810

TimotheeWrightFicha opened this issue Jul 4, 2023 · 9 comments
Labels
question Further information is requested Stale

Comments

@TimotheeWrightFicha
Copy link

TimotheeWrightFicha commented Jul 4, 2023

Search before asking

Question

Hello @glenn-jocher,

I have a small question regarding the model.render() function.

Let's first define this class

class BBoxDetector():
     def __init__(self,...):
         self.model = torch.hub.load(yolo_path, 'custom', path=onnx_path, source='local')
         
     def get_model_inference(self, image):
        return self.model(image, size=self.img_width)

And call it

model_inference = self.bbox_detector.get_model_inference(processed_image)
image = model_inference.render()[0]
predictions1 = model_inference.pandas().xywh[0]

model_inference = self.bbox_detector.get_model_inference(processed_image)
predictions = model_inference.pandas().xywh[0]

print("predictions1 : ", predictions1.values)
print("predictions : ", predictions.values)

Output:

predictions1 :  [[325.44366455078125 550.25048828125 470.6707763671875 179.49908447265625 0.9610268473625183 1 'class1']
 [418.23046875 38.195106506347656 154.7244873046875 75.89674377441406 0.7089141011238098 0 'class2']]
 
predictions :  [[323.0789794921875 547.15625 464.1681823730469 183.42236328125 0.9449762105941772 1 'class1']]

As you can see the predictions points are a bit different but moreover, one time we have 2 object, the other only one object.

Can you explain to me why adding image = model_inference.render()[0] change the predictions ?

It seems that most of the time the output are the same, but this is constantly happening for one image.

Thank you !

Additional

I sometime want to call model_inference.render()[0] to have the bounding boxes drawn on the image for debugging purpose

No response

@TimotheeWrightFicha TimotheeWrightFicha added the question Further information is requested label Jul 4, 2023
@TimotheeWrightFicha
Copy link
Author

Actually it seems that if I don't call model_inference.render()[0] predictions are not correct.

But in this example you don't seem to call it.
#12 (comment)

So i'm a bit confused

@glenn-jocher
Copy link
Member

@TimotheeWrightFicha the model_inference.render() function does not modify the predictions of the model. Instead, it visualizes the bounding boxes on the image for debugging purposes. The discrepancy you are observing in the predictions is likely due to a different image being passed as input to get_model_inference() in the two calls.

In the example you provided, it appears that model_inference.render() is not called. Therefore, the predictions are not visualized on the image. This does not mean that the predictions themselves are incorrect. The pandas().xywh function is used to extract the bounding box coordinates, and it should give you the accurate predictions regardless of whether the image is rendered or not.

If you are experiencing inconsistent predictions for a specific image, it might be worth investigating any potential variations in the input image (e.g., resizing, normalization, cropping, etc.) between the two calls. Additionally, ensure that the get_model_inference() function is consistently returning the same model configuration and weights.

Please let me know if you have any further questions or concerns. We are here to help!

@TimotheeWrightFicha
Copy link
Author

Thank you for the answer.

In the following code i'm testing if the 2 images are different after each operation

processed_image1 = processed_image.copy()
processed_image2 = processed_image.copy()

difference = cv2.subtract(processed_image1, processed_image2)    
result = not np.any(difference)
if result is True:
    print("0. Pictures are the same")
else:
    print("0. Pictures are different")

model_inference = self.bbox_detector.get_model_inference(processed_image1)

difference = cv2.subtract(processed_image1, processed_image2)    
result = not np.any(difference)
if result is True:
    print("1. Pictures are the same")
else:
    print("1. Pictures are different")

image = model_inference.render()[0]

difference = cv2.subtract(processed_image1, processed_image2)    
result = not np.any(difference)
if result is True:
    print("2. Pictures are the same")
else:
    print("2. Pictures are different")

predictions1 = model_inference.pandas().xywh[0]

difference = cv2.subtract(processed_image1, processed_image2)    
result = not np.any(difference)
if result is True:
    print("3. Pictures are the same")
else:
    print("3. Pictures are different")

OUTPUT:

0. Pictures are the same
1. Pictures are the same
2. Pictures are different
3. Pictures are different

It is clear that model_inference.render()[0] does some inplace change on processed_image1, which is a problem.

I've tried to look a bit into the code of Yolo but I don't really understand what's going on.
I can now fix my issues by just having a copy of the image but it would be nice to raise awarness about this, if you consider this to be an issue

@glenn-jocher
Copy link
Member

@TimotheeWrightFicha thank you for bringing this to our attention. We appreciate your effort in providing the code and the corresponding output to help us understand the issue.

Based on your code and the observed output, it appears that model_inference.render()[0] is indeed modifying the processed_image1 in-place, resulting in a difference between processed_image1 and processed_image2 after the call to render().

Allow me to investigate this behavior further to provide a more accurate explanation. I will review the relevant code in YOLOv5 and consult with the team to understand if this is an intended behavior or a potential issue.

I will get back to you as soon as possible with more information and a proposed solution. Thank you for your patience, and we apologize for any inconvenience this may have caused.

Please let me know if you have any additional details or questions related to this issue. We appreciate your contribution to the project.

@TimotheeWrightFicha
Copy link
Author

Your fast and precise support is always appreciated @glenn-jocher ! if you need context to debug this I can be available :)

@glenn-jocher
Copy link
Member

Hello @TimotheeWrightFicha,

Thank you for bringing this issue to our attention and providing the code and output to help us understand the problem. We apologize for any inconvenience this may have caused.

Based on the code you provided, it seems that model_inference.render()[0] is modifying the processed_image1 in-place, resulting in a difference between processed_image1 and processed_image2 after the call to render(). We are currently investigating this behavior to determine if it is an intended feature or a potential issue.

We appreciate your willingness to provide further context or assistance in debugging this issue. Your contribution is valuable in helping us improve the YOLOv5 project.

We will thoroughly investigate this behavior and provide a proper solution or clarification as soon as possible. We apologize for any delays and appreciate your patience.

Please don't hesitate to reach out if you have any further questions or concerns. We are here to help!

Thank you for your continued support.

-Glenn Jocher

@TimotheeWrightFicha
Copy link
Author

@glenn-jocher I'd like to add a thought.

If I get the predictions
predictions = model_inference.pandas().xywh[0]
I obtain [320, 487, 587,300,x,x,'x' ]

Which is good for my 720, 1280 image but the predictions will not match the 640x640 image that we get from model_inference.render()[0].

Is that an issue or is it expected ?

@glenn-jocher
Copy link
Member

@TimotheeWrightFicha the predictions obtained using model_inference.pandas().xywh[0] are based on the original image dimensions (720x1280). On the other hand, the image returned by model_inference.render()[0] is resized to 640x640, which means the predictions will be in the context of the resized image.

This behavior is expected because model_inference.render() resizes the image for visualization purposes, while the raw predictions are in relation to the original image dimensions.

To obtain predictions that match the 640x640 resized image, you can resize the predictions to match the dimensions of the resized image using appropriate scaling factors.

Let me know if you have any further questions or concerns. We're here to help!

-Glenn Jocher

@github-actions
Copy link
Contributor

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants