Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After changing path of dataset, validation still searching image on old location #3349

Closed
GiorgioSgl opened this issue May 26, 2021 · 5 comments · Fixed by #3350
Closed

After changing path of dataset, validation still searching image on old location #3349

GiorgioSgl opened this issue May 26, 2021 · 5 comments · Fixed by #3350
Labels
bug Something isn't working

Comments

@GiorgioSgl
Copy link

🐛 Bug

The bug is caused by the valid partition of the dataset. Actually I'm working with the OpenImage dataset of Google. The problem is very easy: I change the OS where I'm training, I pass to windows so all directory change and I can put the dataset in the same location, so what I have done is changin the data.yaml and change the derictory of the dataset. It's okay for the training but it's not okay for the Valid.

In the frist epochs is doing the training with train test without any problem, but when it's the time of the valid it's searching images on the old location. And is giving me an error sayng that it can find the first image of the valid set.

To Reproduce

Just use the train.py, change location of the dataset and also fo the data.yaml file

Output

Traceback (most recent call last): 
  File "train.py", line 543, in <module> 
    train(hyp, opt, device, tb_writer) 
  File "train.py", line 354, in train 
    results, maps, times = test.test(data_dict, 
  File "C:\Users\gofor\anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context 
    return func(*args, **kwargs) 
  File "C:\Users\gofor\Desktop\yolov5-master\test.py", line 102, in test 
    for batch_i, (img, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)): 
  File "C:\Users\gofor\anaconda3\lib\site-packages\tqdm\std.py", line 1165, in iter 
    for obj in iterable: 
  File "C:\Users\gofor\Desktop\yolov5-master\utils\datasets.py", line 104, in iter 
    yield next(self.iterator) 
  File "C:\Users\gofor\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 517, in next 
    data = self._next_data() 
  File "C:\Users\gofor\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1199, in _next_data 
    return self._process_data(data) 
  File "C:\Users\gofor\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1225, in _process_data 
    data.reraise() 
  File "C:\Users\gofor\anaconda3\lib\site-packages\torch\_utils.py", line 429, in reraise 
    raise self.exc_type(msg) 
AssertionError: Caught AssertionError in DataLoader worker process 0. 
Original Traceback (most recent call last): 
  File "C:\Users\gofor\anaconda3\lib\site-packages\torch\utils\data\_utils\worker.py", line 202, in _worker_loop 
    data = fetcher.fetch(index) 
  File "C:\Users\gofor\anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch 
    data = [self.dataset[idx] for idx in possibly_batched_index] 
  File "C:\Users\gofor\anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp> 
    data = [self.dataset[idx] for idx in possibly_batched_index] 
  File "C:\Users\gofor\Desktop\yolov5-master\utils\datasets.py", line 540, in getitem 
    img, (h0, w0), (h, w) = load_image(self, index) 
  File "C:\Users\gofor\Desktop\yolov5-master\utils\datasets.py", line 638, in load_image 
    assert img is not None, 'Image Not Found ' + path 
AssertionError: Image Not Found /home/goforespain/Dataset/images/valid/d0776b45a6256287.jpg

Expected behavior

Work with the new location define in the data.yaml and not the old one

Environment

If applicable, add screenshots to help explain your problem.

  • OS: Windows
  • GPU: geoforce gtx 1070
  • CUDA: 11.1
@GiorgioSgl GiorgioSgl added the bug Something isn't working label May 26, 2021
@GiorgioSgl
Copy link
Author

I just notice that the valid.cache file has not been reinitalized when change location of the dataset, while the train yes. So i just remove it and see if something changed.

@glenn-jocher
Copy link
Member

@GiorgioSgl thanks for the bug report! Yes this happening because the cache file saved the older directories, I should update the cache hash to recognize changes in dataset location in addition to dataset contents.

glenn-jocher added a commit that referenced this issue May 26, 2021
@glenn-jocher glenn-jocher linked a pull request May 26, 2021 that will close this issue
@GiorgioSgl
Copy link
Author

@GiorgioSgl thanks for the bug report! Yes this happening because the cache file saved the older directories, I should update the cache hash to recognize changes in dataset location in addition to dataset contents.

Yeah that will be amazing! Thanks for the fast answer.

glenn-jocher added a commit that referenced this issue May 26, 2021
* Update cache v0.2 to include parent hash

Possible fix for #3349

* Update datasets.py
@glenn-jocher
Copy link
Member

glenn-jocher commented May 26, 2021

@GiorgioSgl good news 😃! Your original issue may now be fixed ✅ in PR #3350. This PR implements a new hashlib-based solution for detecting changes to dataset contents or location, recaching as necessary when either is detected. This new system will force all YOLOv5 users to recache their existing datasets once, but this should occur automatically one time only and is not a breaking change. To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload with model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@GiorgioSgl
Copy link
Author

Gracias tío! El mejor de verdad!

Lechtr pushed a commit to Lechtr/yolov5 that referenced this issue Jul 20, 2021
* Update cache v0.2 to include parent hash

Possible fix for ultralytics#3349

* Update datasets.py

(cherry picked from commit c6b5bfc)
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this issue Aug 26, 2022
* Update cache v0.2 to include parent hash

Possible fix for ultralytics#3349

* Update datasets.py
SecretStar112 added a commit to SecretStar112/yolov5 that referenced this issue May 24, 2023
* Update cache v0.2 to include parent hash

Possible fix for ultralytics/yolov5#3349

* Update datasets.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants