-
-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benefit of providing images larger than the training size? #12912
Comments
@agentmorris hi Dan! 😊 Thanks for your detailed question! Let's dive into the crux of your query. When training YOLOv5 models, the In short, while having larger input images can provide more detail, during the training phase, images will be resized to the dimension specified by Always happy to help with your YOLOv5 queries! 🚀 For more in-depth explanations, feel free to consult our documentation: https://docs.ultralytics.com/yolov5/ |
Thanks... I almost followed that. It's really helpful to know that scaling is just simulating different resolutions, rather than zooming (i.e., scaling then cropping, which is how I had (incorrectly) interpreted it). That suggests that there should be zero benefit in starting with larger images. But I'm not sure what this part means:
The rest of your answer suggests that image sizes larger than the value of --imgsz (1280 in this case) won't help the model learn finer details during training. Can you clarify what the scenario is where larger image sizes can make a difference? If resizing is the first thing that happens, shouldn't it never make a difference? I understand that variability in the precise resizing implementation might matter a little, so I won't literally get exactly the same results if I resize in advance vs. providing higher-resolution images and letting the training code resize. In that sense even providing images of size 1281 vs. 1280 could lead to very slightly different results. But I'm trying to assess whether there's any more deterministic reason that larger images could give the model access to finer details during training. If resizing happens first, that shouldn't be the case, but your answer opened the door just enough that I'd like to confirm. Thanks! |
@agentmorris, glad you reached out for clarification! Let's make it crystal clear. 🌟 When I mentioned the potential benefits of higher resolution images, I was referring to a nuanced aspect of the pre-processing step. Even though resizing happens first, the quality and detail of the original image can influence the final resized image's quality. This is because resizing algorithms (like bilinear, bicubic, etc.) interpolate pixel values when scaling down, and starting with a high-resolution image can lead to a slightly better-quality resized image due to more available detail for the algorithm to work with. However, you're correct in your understanding that, in practical terms, the benefit might be marginal. Provided that the resizing step is the first operation that occurs, the direct impact on learning finer details might not be as significant as one might hope, especially for images drastically larger than your training size ( So, to sum up, while theoretically, higher-resolution images could provide slightly better input quality due to more detailed resizing, the practical impact on model performance, especially for significantly larger images, would likely be minimal. The key takeaway should indeed be that resizing to your target training size is generally a sound and efficient practice. I hope this clears up any confusion! Always here to help. 😊 |
Thanks, I'm clear now. I will plan to resize in advance to something still slightly larger than the training size, but I won't short-change the resizing; it's easy enough to spend essentially arbitrary compute resources before training if it saves time during training. For example, Image.Resampling.LANCZOS seems to be the most expensive option that PIL offers, so I will resize from the original size to, e.g., 1600px on a side using Image.Resampling.LANCZOS prior to training. My experience has been that providing 1600px images during training is much faster than providing, e.g., 3000px images, and while it's still slower than providing 1280px images, I'll sleep better knowing I'm 99.9999999% sure that I'm paying zero accuracy price. If any of that sounds bananas, let me know, otherwise I think I'm good. Thanks! |
@agentmorris, your approach sounds pretty solid! 🚀 Resizing to a slightly larger size than your training dimension, using a high-quality resampling method like Here's a quick example of how you could do the resizing in Python using PIL, just in case: from PIL import Image
def resize_image(input_path, output_path, target_size=1600):
with Image.open(input_path) as img:
scale = target_size / max(img.size)
new_size = (int(img.width * scale), int(img.height * scale))
resized_img = img.resize(new_size, Image.Resampling.LANCZOS)
resized_img.save(output_path)
resize_image('path/to/your/original/image.jpg', 'path/to/your/resized/image.jpg') It's great to see such dedication to optimizing your workflow! If you have any more questions or need further assistance, don't hesitate to ask. Happy training! 😊 |
Search before asking
Question
I am training a YOLOv5x6 model using --imgsz 1280. My original images are larger than 1280 pixels on the long side, and I'm trying to assess whether there's any benefit to providing images that are larger than 1280px during training.
In particular, if the "scale" hyperparameter is non-zero, it's not clear to me whether positive scaling can create a scenario where it's useful to have "extra pixels". E.g. if the "scale" hyperparameter is set to 0.9 (as it is in hyp.scratch-high.yaml), does that mean it's possible that the image will be scaled up by 90%, then either cropped to 1280px or shown to the model as a large image with 2432px on the long side? If that's the case, there would be additional benefit to having as many as 1280*1.9=2432 pixels on the long side, but I think I'm misunderstanding how scaling works, since basically every other thread recommends resizing to 1280px.
More generally, is there any scenario where accuracy can be higher if training images are larger than the value of --imgsz?
Thanks!
-Dan
Additional
No response
The text was updated successfully, but these errors were encountered: