Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go through all_images on Google Storage and resize them to take up less space #69

Open
mrdbourke opened this issue Feb 8, 2023 · 1 comment

Comments

@mrdbourke
Copy link
Owner

JPEG images don't have to be 100% quality in storage.

The models only train on images of sizes 224, 299, 384 etc...

So could potentially resize all images to only be at max 1000x1000?

Go through images and downsize if:

  • Image is over 1MB (if it's already under, skip?)
  • Resize image to something like 50% of original size (still above regular image training size but lower for storage)

Benefits:

  • Less storage in general (but storage is cheap)
  • Less processing time per image in the loading pipeline (if the image is of less quality, it'll be easier to load = potentially faster training)
@mrdbourke
Copy link
Owner Author

mrdbourke commented Feb 14, 2023

Doesn't look like I need to do this...

Experiments show that it doesn't change much.

Without resizing:

Min load time: 6.175041198730469e-05
Max load time: 0.0051898956298828125
Mean load time: 7.927996739415816e-05
Median load time: 7.2479248046875e-05

With resizing (all images to (600, 600)):

Min load time: 6.628036499023438e-05
Max load time: 0.0004761219024658203
Mean load time: 7.431405741298758e-05
Median load time: 7.224082946777344e-05

Results:

Original images mean load time: 7.927996739415816e-05
Resized images mean load time: 7.431405741298758e-05
Resized images load time is 1.0668232923089287 times faster than original images

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant