You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Removing duplicates will help make the model more robust and prevent data from leaking from train → test set (and then giving false metrics)
Created a small notebook for this (07_remove_duplicates.ipynb) and it seems to work very well, found ~500/24500 images were duplicates in a few minutes and there were little samples that weren’t (after a series of quick random plots)
Could integrate this workflow to run over all the images every so often (or whenever new data is added to the dataset).
The text was updated successfully, but these errors were encountered:
07_remove_duplicates.ipynb
) and it seems to work very well, found ~500/24500 images were duplicates in a few minutes and there were little samples that weren’t (after a series of quick random plots)Could integrate this workflow to run over all the images every so often (or whenever new data is added to the dataset).
The text was updated successfully, but these errors were encountered: