Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated cache v0.2 with hashlib #3350

Merged
merged 2 commits into from
May 26, 2021
Merged

Updated cache v0.2 with hashlib #3350

merged 2 commits into from
May 26, 2021

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented May 26, 2021

Possible fix for #3349

This PR increments the cache file version to 0.2 and uses a new hashlib-based solution which detects changes in dataset contents and location, recaching on any changes in either.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Improved dataset hashing for better cache validation.

📊 Key Changes

  • Modified the get_hash function to compute hashes based on paths (files or directories) instead of just file sizes.
  • Hash now includes both the cumulative size and the actual paths of the dataset for a unique identifier.
  • Removed check for 'version' in cache; now solely relies on the new hashing method to validate the cache.

🎯 Purpose & Impact

  • Enhanced Accuracy: The new hashing method provides a more robust way to detect changes in the dataset, reducing the risk of using outdated cache entries.
  • Increased Reliability: The combination of size and path in the hash helps prevent false cache invalidation, ensuring that cache is only rebuilt when necessary.
  • User Experience: Users may observe faster setup times for repeat training sessions due to fewer unnecessary cache rebuilds.

@glenn-jocher glenn-jocher changed the title Update cache v0.2 to include parent hash Updated cache v0.2 with hashlib May 26, 2021
@glenn-jocher
Copy link
Member Author

Verified dataset renaming/moving incurs hash change and recaching op:

Screenshot 2021-05-26 at 14 25 03

@glenn-jocher glenn-jocher merged commit c6b5bfc into master May 26, 2021
@glenn-jocher glenn-jocher deleted the glenn-jocher-patch-3 branch May 26, 2021 12:26
Lechtr pushed a commit to Lechtr/yolov5 that referenced this pull request Jul 20, 2021
* Update cache v0.2 to include parent hash

Possible fix for ultralytics#3349

* Update datasets.py

(cherry picked from commit c6b5bfc)
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* Update cache v0.2 to include parent hash

Possible fix for ultralytics#3349

* Update datasets.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After changing path of dataset, validation still searching image on old location
1 participant