Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benefit of providing images larger than the training size? #12912

Closed
1 task done
agentmorris opened this issue Apr 12, 2024 · 5 comments
Closed
1 task done

Benefit of providing images larger than the training size? #12912

agentmorris opened this issue Apr 12, 2024 · 5 comments
Labels
question Further information is requested

Comments

@agentmorris
Copy link

Search before asking

Question

I am training a YOLOv5x6 model using --imgsz 1280. My original images are larger than 1280 pixels on the long side, and I'm trying to assess whether there's any benefit to providing images that are larger than 1280px during training.

In particular, if the "scale" hyperparameter is non-zero, it's not clear to me whether positive scaling can create a scenario where it's useful to have "extra pixels". E.g. if the "scale" hyperparameter is set to 0.9 (as it is in hyp.scratch-high.yaml), does that mean it's possible that the image will be scaled up by 90%, then either cropped to 1280px or shown to the model as a large image with 2432px on the long side? If that's the case, there would be additional benefit to having as many as 1280*1.9=2432 pixels on the long side, but I think I'm misunderstanding how scaling works, since basically every other thread recommends resizing to 1280px.

More generally, is there any scenario where accuracy can be higher if training images are larger than the value of --imgsz?

Thanks!

-Dan

Additional

No response

@agentmorris agentmorris added the question Further information is requested label Apr 12, 2024
@glenn-jocher
Copy link
Member

@agentmorris hi Dan! 😊

Thanks for your detailed question! Let's dive into the crux of your query.

When training YOLOv5 models, the --imgsz parameter sets the image size the model expects during training. If your original images are larger than this size, they will be resized to match the specified dimension. The "scale" hyperparameter allows for some variation during training by randomly scaling images, but this operation is applied before resizing to --imgsz. So, if scale is set to 0.9, it doesn't mean the image will scale up to be larger than --imgsz. Instead, it scales the original image before resizing it back down, aiming at introducing more variability and robustness in the model, simulating different resolutions.

In short, while having larger input images can provide more detail, during the training phase, images will be resized to the dimension specified by --imgsz. There's typically little to no benefit in supplying images larger than your set training dimensions from an accuracy perspective, due to this resizing. However, providing higher resolution images might be beneficial in scenarios where you're experimenting with different input sizes or seeking the best quality before the resizing operation takes place, helping the model learn finer details during training.

Always happy to help with your YOLOv5 queries! 🚀

For more in-depth explanations, feel free to consult our documentation: https://docs.ultralytics.com/yolov5/

@agentmorris
Copy link
Author

Thanks... I almost followed that. It's really helpful to know that scaling is just simulating different resolutions, rather than zooming (i.e., scaling then cropping, which is how I had (incorrectly) interpreted it). That suggests that there should be zero benefit in starting with larger images. But I'm not sure what this part means:

providing higher resolution images might be beneficial ...[where you're]... seeking the best quality before the resizing operation takes place, helping the model learn finer details during training.

The rest of your answer suggests that image sizes larger than the value of --imgsz (1280 in this case) won't help the model learn finer details during training. Can you clarify what the scenario is where larger image sizes can make a difference? If resizing is the first thing that happens, shouldn't it never make a difference?

I understand that variability in the precise resizing implementation might matter a little, so I won't literally get exactly the same results if I resize in advance vs. providing higher-resolution images and letting the training code resize. In that sense even providing images of size 1281 vs. 1280 could lead to very slightly different results. But I'm trying to assess whether there's any more deterministic reason that larger images could give the model access to finer details during training. If resizing happens first, that shouldn't be the case, but your answer opened the door just enough that I'd like to confirm.

Thanks!

@glenn-jocher
Copy link
Member

@agentmorris, glad you reached out for clarification! Let's make it crystal clear. 🌟

When I mentioned the potential benefits of higher resolution images, I was referring to a nuanced aspect of the pre-processing step. Even though resizing happens first, the quality and detail of the original image can influence the final resized image's quality. This is because resizing algorithms (like bilinear, bicubic, etc.) interpolate pixel values when scaling down, and starting with a high-resolution image can lead to a slightly better-quality resized image due to more available detail for the algorithm to work with.

However, you're correct in your understanding that, in practical terms, the benefit might be marginal. Provided that the resizing step is the first operation that occurs, the direct impact on learning finer details might not be as significant as one might hope, especially for images drastically larger than your training size (--imgsz 1280). The differences might be subtle, and in most scenarios, the computational efficiency gained by resizing your images to the training dimension before training could outweigh the potential marginal gains in image quality.

So, to sum up, while theoretically, higher-resolution images could provide slightly better input quality due to more detailed resizing, the practical impact on model performance, especially for significantly larger images, would likely be minimal. The key takeaway should indeed be that resizing to your target training size is generally a sound and efficient practice.

I hope this clears up any confusion! Always here to help. 😊

@agentmorris
Copy link
Author

Thanks, I'm clear now. I will plan to resize in advance to something still slightly larger than the training size, but I won't short-change the resizing; it's easy enough to spend essentially arbitrary compute resources before training if it saves time during training. For example, Image.Resampling.LANCZOS seems to be the most expensive option that PIL offers, so I will resize from the original size to, e.g., 1600px on a side using Image.Resampling.LANCZOS prior to training.

My experience has been that providing 1600px images during training is much faster than providing, e.g., 3000px images, and while it's still slower than providing 1280px images, I'll sleep better knowing I'm 99.9999999% sure that I'm paying zero accuracy price.

If any of that sounds bananas, let me know, otherwise I think I'm good. Thanks!

@glenn-jocher
Copy link
Member

@agentmorris, your approach sounds pretty solid! 🚀 Resizing to a slightly larger size than your training dimension, using a high-quality resampling method like Image.Resampling.LANCZOS, is a thoughtful strategy. It balances between ensuring image quality and maintaining practical training speeds. This method can indeed preserve more details compared to simpler resampling techniques, potentially contributing positively to model accuracy without the significant computational cost of training with very high-resolution images directly.

Here's a quick example of how you could do the resizing in Python using PIL, just in case:

from PIL import Image

def resize_image(input_path, output_path, target_size=1600):
    with Image.open(input_path) as img:
        scale = target_size / max(img.size)
        new_size = (int(img.width * scale), int(img.height * scale))
        resized_img = img.resize(new_size, Image.Resampling.LANCZOS)
        resized_img.save(output_path)

resize_image('path/to/your/original/image.jpg', 'path/to/your/resized/image.jpg')

It's great to see such dedication to optimizing your workflow! If you have any more questions or need further assistance, don't hesitate to ask. Happy training! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants