Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Upload Failed with Multiple YAML's found #89

Closed
1 task done
PanterSoft opened this issue Aug 28, 2022 · 24 comments
Closed
1 task done

Dataset Upload Failed with Multiple YAML's found #89

PanterSoft opened this issue Aug 28, 2022 · 24 comments
Assignees
Labels
bug Something isn't working

Comments

@PanterSoft
Copy link

Search before asking

  • I have searched the HUB issues and found no similar bug report.

HUB Component

No response

Bug

I am trying to upload a Dataset wich has the exact same structure as the tutorial suggests the yaml file has the correct format and the labels only consists of numbers.

Bildschirmfoto 2022-08-28 um 09 27 27

After that was not working I tried the example dataset coco6 and got the same result shown in the image below.

Bildschirmfoto 2022-08-28 um 09 20 54

Environment

I am using a 2021 MacBook Pro 14 inch M1 Pro running Safari.

Minimal Reproducible Example

  1. Login
  2. Create Dataset
  3. Name Dataset and upload dataset.zip
  4. Let it upload and fail while processing

Additional

No response

@PanterSoft PanterSoft added the bug Something isn't working label Aug 28, 2022
@github-actions
Copy link

👋 Hello @PanterSoft, thank you for raising an issue about Ultralytics HUB 🚀! Please visit https://ultralytics.com/hub to learn more, and see our ⭐️ HUB Guidelines to quickly get started uploading datasets and training YOLOv5 models.

If this is a 🐛 Bug Report, please provide screenshots and steps to recreate your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

@glenn-jocher
Copy link
Member

@PanterSoft hi I saw this issue was resolved on your side? Should we close the issue?

@ghost
Copy link

ghost commented Aug 29, 2022

@PanterSoft hi I saw this issue was resolved on your side? Should we close the issue?

Sorry was a mistake I still have the same problem.

@ghost
Copy link

ghost commented Aug 29, 2022

I found the issue when you create a zip file on MacOS it creates a hidden folder in the file which contains the same files which then gives the error multiple yaml files. solved it by uploading dataset from Windows

@kalenmike
Copy link
Member

@nicomattes Thanks for letting us know. We will look into improving this for zips created with MacOS.

@kalenmike kalenmike self-assigned this Sep 1, 2022
@mehlkelm
Copy link

I have the same problem.
Workaround: zip the folder from the command line like this:
https://apple.stackexchange.com/a/239587

@glenn-jocher
Copy link
Member

@mehlkelm hi! Could you explain in more detail how the issue arises or show us screenshots of what your directory structure looks like that is causing problems?

@mehlkelm
Copy link

The issue arises (I think) because macOS creates a hidden (for mac users) directory "__MACOSX" inside the zip file that contains additional information about all the files. To a non-mac OS it looks like everything is duplicated.

Maybe the dataset upload tool should just ignore the __MACOSX folder in there.

@glenn-jocher
Copy link
Member

@mehlkelm yeah good point. We definitely want a macOS-robust tool. Could you upload a small zip that's crashing HUB here or email it to glenn.jocher@ultralytics.com? Thanks!

@mehlkelm
Copy link

This is a completely made up example but it shows the problem.
(It doesn't crash HUB, it just shows the "Multiple YAML's found" problem)

macos_zip_problem_demo.zip

@glenn-jocher
Copy link
Member

@mehlkelm thanks! We'll use the zip to debug and improve the dataset upload process.

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 18, 2022

When unzipping I see the __MACOSX/ directory containing a duplicate YAML:

(venv39) glennjocher@Glenns-MacBook-Air Downloads % unzip macos_zip_problem_demo.zip
Archive:  macos_zip_problem_demo.zip
   creating: macos_zip_problem_demo/
  inflating: macos_zip_problem_demo/.DS_Store  
  inflating: __MACOSX/macos_zip_problem_demo/._.DS_Store  
   creating: macos_zip_problem_demo/images/
  inflating: macos_zip_problem_demo/macos_zip_problem_demo.yaml  
  inflating: __MACOSX/macos_zip_problem_demo/._macos_zip_problem_demo.yaml  
   creating: macos_zip_problem_demo/labels/
  inflating: macos_zip_problem_demo/images/.DS_Store  
  inflating: __MACOSX/macos_zip_problem_demo/images/._.DS_Store  
   creating: macos_zip_problem_demo/images/train/
   creating: macos_zip_problem_demo/images/val/
  inflating: macos_zip_problem_demo/labels/.DS_Store  
  inflating: __MACOSX/macos_zip_problem_demo/labels/._.DS_Store  
   creating: macos_zip_problem_demo/labels/train/
   creating: macos_zip_problem_demo/labels/val/
  inflating: macos_zip_problem_demo/images/train/IMG_1931.jpeg  
  inflating: __MACOSX/macos_zip_problem_demo/images/train/._IMG_1931.jpeg  
  inflating: macos_zip_problem_demo/images/train/IMG_1925.jpeg  
  inflating: __MACOSX/macos_zip_problem_demo/images/train/._IMG_1925.jpeg  
  inflating: macos_zip_problem_demo/images/train/IMG_1923.jpeg  
  inflating: __MACOSX/macos_zip_problem_demo/images/train/._IMG_1923.jpeg  
  inflating: macos_zip_problem_demo/images/train/IMG_1915.jpeg  
  inflating: __MACOSX/macos_zip_problem_demo/images/train/._IMG_1915.jpeg  
  inflating: macos_zip_problem_demo/images/val/IMG_1927.jpeg  
  inflating: __MACOSX/macos_zip_problem_demo/images/val/._IMG_1927.jpeg  
  inflating: macos_zip_problem_demo/images/val/IMG_1914.jpeg  
  inflating: __MACOSX/macos_zip_problem_demo/images/val/._IMG_1914.jpeg  
  inflating: macos_zip_problem_demo/labels/train/IMG_1915.txt  
  inflating: __MACOSX/macos_zip_problem_demo/labels/train/._IMG_1915.txt  
  inflating: macos_zip_problem_demo/labels/train/IMG_1923.txt  
  inflating: __MACOSX/macos_zip_problem_demo/labels/train/._IMG_1923.txt  
  inflating: macos_zip_problem_demo/labels/train/IMG_1925.txt  
  inflating: __MACOSX/macos_zip_problem_demo/labels/train/._IMG_1925.txt  
  inflating: macos_zip_problem_demo/labels/train/IMG_1931.txt  
  inflating: __MACOSX/macos_zip_problem_demo/labels/train/._IMG_1931.txt  
  inflating: macos_zip_problem_demo/labels/val/IMG_1914.txt  
  inflating: __MACOSX/macos_zip_problem_demo/labels/val/._IMG_1914.txt  
  inflating: macos_zip_problem_demo/labels/val/IMG_1927.txt  
  inflating: __MACOSX/macos_zip_problem_demo/labels/val/._IMG_1927.txt  

We need to update to be robust to this. I'll look into it.

@mehlkelm
Copy link

All the files in __MACOSX are hidden, as are the .DS_Store files, which are also some sort of OS specific info files. Maybe ignoring hidden files in general would be enough…

@mehlkelm
Copy link

(according to the unix way of treating files with names starting with a dot as hidden)

@glenn-jocher
Copy link
Member

@mehlkelm actually this is really strange, on my mac if I unzip using the GUI/mouse commands everything unzips correctly into 1 directory. If I unzip using the terminal I get two top level directories:

(venv39) glennjocher@Glenns-MacBook-Air sandbox % unzip macos_zip_problem_demo.zip

Screenshot 2022-10-18 at 14 42 08

@mehlkelm
Copy link

I guess the regular mac unzip processes the information in __MACOSX (technically they are resource forks) and doesn't interpret them as files, while all the non Mac tools/OS (even unzip in the terminal on mac?) think they are files.

@mehlkelm
Copy link

but yes, it's silly that the mac puts them there

@glenn-jocher
Copy link
Member

Its ok, we can't change macOS but we can change HUB. I'll let you know when this is resolved.

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 18, 2022

@kalenmike I'm not able to produce any errors with HUBDatasetStats() on this zip. To reproduce I downloaded https://github.com/ultralytics/hub/files/9808500/macos_zip_problem_demo.zip, placed in a sandbox/ dir, and ran the code below.

        from utils.dataloaders import HUBDatasetStats
        stats = HUBDatasetStats('/Users/glennjocher/Downloads/sandbox/macos_zip_problem_demo.zip')
        stats.get_json()
        stats.process_images()

Screenshot 2022-10-18 at 15 17 00

Initially directory contained only zip:
Screenshot 2022-10-18 at 15 15 52

After unzip contained the unzipped dir plus an extra __MAXOSX dir with duplicated data:
Screenshot 2022-10-18 at 15 18 10

But the actual dataset processing seemed to work without any duplicate YAML errors. Could you check HUB to see if the error is produced downstream of these steps?

@glenn-jocher
Copy link
Member

@kalenmike @mehlkelm I've opened ultralytics/yolov5#9843 in YOLOv5 to better handle unzipping while rejecting files in an exclude list, i.e. .DS_Store and __MACOSX instances in file paths.

@kalenmike
Copy link
Member

@glenn-jocher There is independent validation on the server that is not excluding the MACOSX folder by the looks of it.

@glenn-jocher
Copy link
Member

@kalenmike ok got it, I'll let you try to debug on the server side.

@glenn-jocher glenn-jocher added the todo Further action is needed by Ultralytics label Oct 18, 2022
@kalenmike
Copy link
Member

This is now resolved and zip files created on Mac can now be processed successfully without special zipping.

@glenn-jocher
Copy link
Member

@kalenmike thanks for the fix!!

@mehlkelm tested today and everything works now. Removing TODO. Let us know if you find any other issues or think of features you'd like to see!
Screenshot 2022-10-20 at 17 02 13

@glenn-jocher glenn-jocher removed the todo Further action is needed by Ultralytics label Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants