Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Crop yields incorrect value for bounding box #903

Closed
bw4sz opened this issue May 21, 2021 · 14 comments
Closed

Random Crop yields incorrect value for bounding box #903

bw4sz opened this issue May 21, 2021 · 14 comments

Comments

@bw4sz
Copy link

bw4sz commented May 21, 2021

🐛 Bug

The bbox_random_crop function does not produce a reasonable result. Consider the following snippet of code.

To Reproduce

from albumentations import functional as F
bbox = (0.129, 0.7846, 0.1626, 0.818)
cropped_bbox = F.bbox_random_crop(bbox=bbox, crop_height=100, crop_width=100, h_start=0.7846, w_start=0.12933, rows=1500, cols=1500)
cropped_bbox
(0.125, 0.788999, 0.629, 1.29)
#Notice y2 is outside of crop range.
#But the following assert passes
assert bbox[3] < (100/1500) + 0.7846
#Fails
assert all([(y >= 0) & (y<=1) for y in list(cropped_bbox)])

Expected behavior

A augmented bbox that is fully within the image crop. The crop_height plus the start of the crop is larger than the y2 of the bounding box, but 1.29 coordinate in the cropped box suggestion it is outside of the crop area.

Environment

  • Albumentations version (e.g., 0.1.8): 0.5.2
  • Python version (e.g., 3.7): 3.7
  • OS (e.g., Linux): OSX
  • How you installed albumentations (conda, pip, source): pip
  • Any other relevant information:

Additional context

I am making a custom augmentation to Zoom in on a given bounding box. CropSafe (but not all boxes). Is there syntax that i'm misunderstanding, it doesn't feel like this could be the case. Dtype issue?

@bw4sz
Copy link
Author

bw4sz commented May 22, 2021

I wanted to come back and give a simpler example to fully zoom in on the syntax that I must not be understanding.

This passes

If you have a box at 100/1000 and make a 200 pixel crop starting at 0, the final bbox coordinate should be 0.5

bbox = (0, 0, 0.1,0.1)
crop_height = 200
rows = 1000
h_start = 0
w_start = 0
cropped_bbox = F.bbox_random_crop(bbox=bbox, crop_height=crop_height, crop_width=crop_height, h_start=h_start, w_start=w_start, rows=rows, cols=rows)
cropped_bbox

assert cropped_bbox[3] == bbox[3] / ((crop_height/rows))
assert all([(y >= 0) & (y<=1) for y in list(cropped_bbox)])

If you shift everything over 100 pixels, the logic holds.

bbox = (0.1, 0.1, 0.2,0.2)
crop_height = 200
rows = 1000
h_start = 0
w_start = 0
cropped_bbox = F.bbox_random_crop(bbox=bbox, crop_height=crop_height, crop_width=crop_height, h_start=h_start, w_start=w_start, rows=rows, cols=rows)
cropped_bbox

assert cropped_bbox[3] == bbox[3] / ((crop_height/rows))
assert all([(y >= 0) & (y<=1) for y in list(cropped_bbox)])

but if you have begin the crop at any position besides 0, you get a strange result?

bbox = (0.1, 0.1, 0.2,0.2)
crop_height = 200
rows = 1000
h_start = 0.1
w_start = 0.1
cropped_bbox = F.bbox_random_crop(bbox=bbox, crop_height=crop_height, crop_width=crop_height, h_start=h_start, w_start=w_start, rows=rows, cols=rows)
cropped_bbox
cropped_bbox
(0.1, 0.1, 0.6, 0.6)

The size of the box is correct, but the final bbox start coordinate should be 0,0 since, h_start == bbox[0] and w_start == bbox[1]?

Digging into the function, you can see that 'get_random_crop_coords' produces a non-unintuitive result?

F.get_random_crop_coords(height=1000, width=1000, crop_height=200, crop_width=200, h_start=0.1, w_start=0.1)
(80, 80, 280, 280)

I would expect a 200 pixel crop starting at 0.1 with an original image size of 1000 to have crop coordinates of 100,300, not 80,280. Where is the 20 offset coming from?

Going one layer deeper to

def get_random_crop_coords(height: int, width: int, crop_height: int, crop_width: int, h_start: float, w_start: float):

y1 = int((height - crop_height) * h_start)

Why is this not

y1 = int(height  * h_start)

The size of the crop does not effect the location of the min coordinate, only the location of the max coordinate. Any crop starting at 0.1 of a 1000 row image should have a y1 coordinate of 100, regardless of crop size?

@Dipet
Copy link
Collaborator

Dipet commented May 23, 2021

Because if we will use y1 = int(height * h_start) we may get crop less then described crop size. For example h_start=0.9 and crop_height=200 when rows=1000

@bw4sz
Copy link
Author

bw4sz commented May 23, 2021

Sure, that's definitely true. But maybe just raise a value error there instead? It creates a disconnect between the function goal (return the coordinates from a crop of a given size and starting position) and the output. For example, imagine a user needs a crop at a specific location, this function makes it difficult to get the correct position. I had to debug for a few hours to get to this. As a compromise can we atleast comment the code in bbox_random_crop to explicitly say that the output coordinates are relative to height - crop_height? I don't think anyone could have anticipated this. I'm guessing that there are other pieces of logic in the codebase that have similar structures. I forked and changed that line, but it yielded strange boxes.
seabirds_rgb_279

I can open a PR on the wider goal of create a ZoomSafe augmentation. I think it will be of use to a wide audience. Its somewhere between RandomResizeCropSafe and Random_Crop. The safe aspect is important because torchvision detection models do not accept blank annotations.

@Dipet
Copy link
Collaborator

Dipet commented May 23, 2021

But maybe just raise a value error there instead?

Bad idea, because we have transforms that used current logic and these changes will break their functionality.

As a compromise can we atleast comment the code in bbox_random_crop to explicitly say that the output coordinates are relative to height - crop_height?

I agree, it would be great to add explanation of how this function works.

I can open a PR on the wider goal of create a ZoomSafe augmentation. I think it will be of use to a wide audience. Its somewhere between RandomResizeCropSafe and Random_Crop. The safe aspect is important because torchvision detection models do not accept blank annotations.

It would be great. We will wait for your PR.

@Dimfred
Copy link

Dimfred commented May 26, 2021

Btw padding an image produces also wrong bounding boxes. All bboxes are shifted to the left and to the top on the padded image.

@bw4sz
Copy link
Author

bw4sz commented May 26, 2021 via email

@Dimfred
Copy link

Dimfred commented May 26, 2021

Sure, so the first image is one from the padded, and the second one from an unpadded case. The green bounding box is the ground truth.
Just for clarification the red box is a prediction, just ignore it. The differences of the two green boxes are important.

Super bad for me have to redo 10 days of grid search for my thesis, super annoying...

With padding
padded

Without padding
unpadded

There are like 3 - 4 pixels difference.

@Dimfred
Copy link

Dimfred commented May 26, 2021

So far I see in bbox_utils my yolo coordinates get first denormalized to absolute and then renormalized to albumentations format, in that process some int trimming is happening which seems to cause those issues, I could simply normalize without denormalizing by using the yolo format as it is and converting in the float domain.
Seems like an extra step anyway.
I also noticed that SafeRotate seems also to be broken.

@bw4sz
Copy link
Author

bw4sz commented May 26, 2021

I can see similar behavior

Rzepecki Islands_south_2016_Chinstrap_penguins_96

        transform = A.Compose([            A.PadIfNeeded(min_height=600,min_width=600, border_mode=cv2.BORDER_CONSTANT, value=0),                        A.pytorch.ToTensorV2()        ])        

@Dipet I will work on a reproducible example tomorrow. I believe all of these issues are related to the normalization. I am using pascal, not yolo. I will rerun with

        transform = A.Compose([
            A.PadIfNeeded(min_height=600,min_width=600, border_mode=cv2.BORDER_CONSTANT, value=0),            
            A.pytorch.ToTensorV2()
        ], A.BboxParams(format='pascal_voc'))

@Dimfred
Copy link

Dimfred commented May 27, 2021

Are you also transforming your bounding boxes? Or only the image? Because your result looks like you are not transforming the bounding boxes, or you don't put the class to the back. Your results are like really broke, but look at mine the bounding box is shifted only some pixels.

@Dimfred
Copy link

Dimfred commented May 27, 2021

So I fixed the errors for me by changing the conversion in bbox_utils for the yolo case.
Since yolo comes in already relative a conversion to absolute format and then back to relative will yield errors.

So instead of convertig to absolute I removed it and now with the coordinates I get from yolo

x, y, w, h = yolo_bbox

w_half, h_half = w / 2, h / 2
x_min, x_max = x - w_half, x + w_half
y_min, y_max = y - h_half, y + h_half

and vice versa in the from_albumentation case.

@Dipet
Copy link
Collaborator

Dipet commented Jul 7, 2021

Should be fixed by #924

@Dipet Dipet closed this as completed Jul 7, 2021
@veve90
Copy link

veve90 commented Jun 2, 2022

So I fixed the errors for me by changing the conversion in bbox_utils for the yolo case. Since yolo comes in already relative a conversion to absolute format and then back to relative will yield errors.

So instead of convertig to absolute I removed it and now with the coordinates I get from yolo

x, y, w, h = yolo_bbox

w_half, h_half = w / 2, h / 2
x_min, x_max = x - w_half, x + w_half
y_min, y_max = y - h_half, y + h_half

and vice versa in the from_albumentation case.

Could you please tell us in what function you propose this change? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants