Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset upload fails with large number of images #42

Closed
1 task done
barney2074 opened this issue Jun 3, 2022 · 20 comments
Closed
1 task done

Dataset upload fails with large number of images #42

barney2074 opened this issue Jun 3, 2022 · 20 comments
Assignees
Labels
question A HUB question that does not involve a bug

Comments

@barney2074
Copy link

Search before asking

Question

Hello,

I am having a problem uploading a dataset:

  • I've followed the instructions, and I'm pretty certain I've got the folder structure and naming correct, because a test set with 20 images/1 class works fine.
  • However, exactly the same data structure, YAML file etc but with 4000 images/4 classes fails- with a 'processing error'
    i.e in the first screenshot below constructaiv6 & constructaiv7 are identical, except for the number of images and labels
  • The same dataset works fine with a local yolov5 (Docker)
  • I've been through the file naming, counts etc, class IDs etc and I can't see anything wrong

I'm loving yolov5 and finding it very easy to use- but would like to try the hub option.
(A separate question- but I was wondering if there was any way to use a locally trained model on the phone app ?)

I'd be very grateful for any help with this

image
image
image

Andrew

Additional

No response

@barney2074 barney2074 added the question A HUB question that does not involve a bug label Jun 3, 2022
@github-actions
Copy link

github-actions bot commented Jun 3, 2022

👋 Hello @barney2074, thank you for raising an issue about Ultralytics HUB 🚀! Please visit https://ultralytics.com/hub to learn more, and see our ⭐️ HUB Guidelines to quickly get started uploading datasets and training YOLOv5 models.

If this is a 🐛 Bug Report, please provide screenshots and steps to recreate your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

@glenn-jocher
Copy link
Member

@barney2074 thanks for the bug report! We will take a look here and see what might be wrong.

@glenn-jocher
Copy link
Member

@barney2074 to answer your other question only models trained on HUB can be used with the app currently.

@barney2074
Copy link
Author

barney2074 commented Jun 3, 2022

thanks Glenn

Since posting, I've been trying a few options to narrow down the problem- initially I thought it might be invalid labels (my images are synthetic and some labels are full-frame, depending on randomization of camera position) but seems this is not the case

My filenames have extra periods in them- I also thought this could be a problem- but the small test dataset that works also has this

I can send you a small-ish dataset that fails if it helps

Andrew

@glenn-jocher
Copy link
Member

@barney2074 we've found an issue on our side related to large dataset uploads as you said. We will keep you updated when a fix has been implemented.

@kalenmike
Copy link
Member

@barney2074 we are still working on preventing the same issue from happening again. For now I have manually forced processing of the dataset in your screenshot which you can access with your account at this link:
https://hub.ultralytics.com/datasets/5tbWXGmJnF0XgFcILguk

@barney2074
Copy link
Author

thanks @kalenmike
The link just takes me back to the dataset listing page and all are shown as failed (apart from my 20 image test)

I also can't see the dataset in the 'train a model' page

But that is fine- I really appreciate your quick response and it seems to be a bug, rather than me doing something wrong- so no hurry.

Andrew

@kalenmike
Copy link
Member

kalenmike commented Jun 6, 2022

@barney2074 Perhaps you are using multiple accounts with the HUB? As we are lucky enough to only have two datasets with the name 'constructaiv7' I was able to track them down to the same user, one is now active and visible while the other is still in a failed state. Unfortunately I am not able to provide more info than that.

I will keep you updated and once we resolve the issue that caused the bug you can also try a new upload.

@kalenmike kalenmike self-assigned this Jun 6, 2022
@barney2074
Copy link
Author

Hi @kalenmike

Yes, sorry, my error- I had inadvertently created 2 accounts

Andrew

@kalenmike
Copy link
Member

@barney2074 Perfect! We have addressed the bug that you found and it seems to be resolved. Please let us know if you still have issues with uploading the larger dataset.

@barney2074
Copy link
Author

thanks @kalenmike - I'll give it a try

@barney2074
Copy link
Author

Hi @kalenmike

Sorry to trouble you- but I'm still having the same problem

It seems to sit at the pulsing yellow 'processing' button for a couple of hours, then says it has failed.
image

the v1.1 dataset only has 40 images- it's been processing for at least an hour

Andrew

@kalenmike
Copy link
Member

kalenmike commented Jun 9, 2022

@barney2074 Ok thanks. I will look into the logs and see what is going wrong.

@kalenmike kalenmike reopened this Jun 9, 2022
@kalenmike
Copy link
Member

@barney2074 I am seeing that your new datasets have been failing because the YAML is incorrectly formatted. It looks like you have duplicated the names key.

This can help to ensure that the YAML is correctly formatted:
https://codebeautify.org/yaml-parser-online

@barney2074
Copy link
Author

Hi @kalenmike

Sorry...! I must have checked that 10 times, but just couldn't see the obvious error

That said- it would be great to have either some detailed error reporting, or the ability to load/validate YAML, images & labels separately (kinda like Roboflow, where stuff is uploaded separately, then gets matched up)

Andrew

@kalenmike
Copy link
Member

@barney2074 Thanks for the feedback. I agree as humans we are prone to error, we are planning to improve the error reporting as well as to auto repair detected errors to prevent re-upload.

@glenn-jocher
Copy link
Member

@barney2074 @kalenmike duplicate names key is incorrect YAML but it doesn't seem to error with the YOLOv5 YAML loaders as far as I can tell. I tested training and dataset_stats() with duplicate names fields (identical and different names) and both still work correctly for me.

Regardless though it helps to examine your YAMLs with an IDE that highlights errors like PyCharm (note red underlines):
Screenshot 2022-06-11 at 19 35 21

@kalenmike
Copy link
Member

@glenn-jocher The error was because the names key was inputted like this
names: names: ['class_1']
This was causing the parser to fail.

@barney2074
Copy link
Author

Hi @kalenmike

sorry- this was caused by a dumb copy/paste error on my part...!

I'm wondering if part of the solution might be to complete YAML parsing/validation as soon as the ZIP dataset is transferred- it seems to take a long time on 'processing' to time out.

I imagine my earlier response i.e uploading in batches then processing it together would complicate the process considerably

Andrew

@glenn-jocher
Copy link
Member

@kalenmike got it. I've opened ultralytics/yolov5#8192 to better report yaml load errors like the one above would probably cause, please review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A HUB question that does not involve a bug
Projects
None yet
Development

No branches or pull requests

3 participants