-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Incorret result scaling #753
Comments
stderr of the processing.
|
The SAM model internally scales the input image to fit inside a 1024x1024 resolution and uses padding to fill out the missing space, which would be 'to the right' of your image in this case (to fill in the narrow side of the image). The mask decoder is supposed to remove this padding, which requires knowing the size of the original image (through the In this case, it looks like the cropping has been flipped: the mask looks cropped at the bottom (judging by the misalignment of the mask with the search bar part of the image) instead of removing the padding on the right (which is why it looks like there's a gap on the right). As for fixing it, I'm not very familiar with the onnx side of things, but maybe the |
@heyoeyo After switching orig_im_size, it seems that sam truncate the longer side with the same ratio. I've also tried pass [h, h] or [w, w], which doesn't work either.
|
Weird! It definitely seems like there's something wrong with the cropping and/or scaling of the mask result to remove the input padding. Swapping the width & height at least seems to fix the removal of the right-side padding (judging from the fact that the mask is horizontally aligned correctly), but it's clearly messing up the scaling still. Though looking at the onnx version of the model, the post-processing code looks ok to me... As a sanity check, it might be worth manually handling the scaling/padding removal (using the
Assuming the masks come out as an np.array, I think something like this should work: # Show low-res mask result after upscaling
result_uint8 = np.uint8((low_res_logits.squeeze() > 0) * 255)
scaled_uint8 = cv2.resize(result_uint8, dsize=(1024,1024))
cv2.imshow("Scaled low-res result", result_uint8)
cv2.waitKey(250)
# Show result after removing padding
cropped_uint8 = scaled_uint8[0:1024, 0:576]
cv2.imshow("Cropped result", cropped_uint8)
cv2.waitKey(250)
# Show final mask scaled back to original size
final_uint8 = cv2.resize(cropped_uint8, dsize=(720,1280))
cv2.imshow("Final result", final_uint8)
# Show windows until a keypress occurs, then close them all
cv2.waitKey(0)
cv2.destroyAllWindows() This should pop-up a bunch of windows to show the intermediate results. The mask will look worse, since the thresholding (>0 check) is happening before scaling, but it should at least give a sense of whether the mask is being cropped/scaled properly, or if something is wrong with the sizings. |
Currently, my program can perfectly work on the demo pictures(e.g. images/truck.jpg). But when I switched to my own pngs, the result seems to be scaled incorrectly.
For instance, the shown size of image below is 432x770, while the result mask seems to be only in 245x770. The image been processed
The text was updated successfully, but these errors were encountered: