Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load CelebA dataset. File is not zip file error. #4

Open
Hackathorn opened this issue Aug 7, 2021 · 6 comments
Open

Unable to load CelebA dataset. File is not zip file error. #4

Hackathorn opened this issue Aug 7, 2021 · 6 comments

Comments

@Hackathorn
Copy link

More of a FYI... Tried to reproduce L17 4_VAE_celeba-inspect notebook. When loading dataset, got ERROR "Unable to load CelebA dataset. File is not zip file error" with "BadZipFile: File is not a zip file". Found TorchVision Issue #2262 that identified problem as exceeding daily max quote on GoogleDrive, punted issue back to dataset authors, and closed their issue. A future version of TorchVision should give a better descriptive error message.

So, FYI to your students. Work-around is to...

@rasbt
Copy link
Owner

rasbt commented Aug 7, 2021

Thanks for the note, Richard, and I agree, this is definitely frustrating. I was recently teaching a GAN tutorial and had similar issues. Downloading the dataset from the original website can be a bit tedious because it involves several steps. So, for this tutorial, I gathered the relevant files and uploaded it as a zip file to my Google Drive.

In case it's useful, it's 1.7 Gb and you only need to unzip it in the current notebook directory (or rather the directory the dataset/dataloader points to): https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=sharing

@Hackathorn
Copy link
Author

Download from your Google Drive and extracted/replace into L17/data folder was simple and worked great.

@AntixK
Copy link

AntixK commented Dec 22, 2021

I have the same issue but even after downloading from your link, I get an error from the _check_integrity() function saying that Dataset not found or corrupted. You can use download=True to download it.

@rasbt
Copy link
Owner

rasbt commented Dec 22, 2021

Have you checked that all the files are non 0 kb? If download=True it may try to overwrite existing files such that they become empty files. If I have the files as shown below it seems to work (tried it the other day, see https://github.com/rasbt/machine-learning-book/blob/main/ch12/ch12_part1.ipynb)

Unknown

@AntixK
Copy link

AntixK commented Dec 23, 2021

I did set download=False after downloading the files manually and checked their size as well. I figured the problem was with the checkintegrity function where it returns False.

So, I wrote a simple workaround to resolve it

class MyCelebA(CelebA):
    """
    A work-around to address issues with pytorch's celebA dataset class.
    
    Download and Extract
    URL : https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=sharing
    """
    
    def _check_integrity(self) -> bool:
        return True

@rasbt
Copy link
Owner

rasbt commented Dec 27, 2021

Thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants