Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with loading example dataset #59245

Closed
rexys777 opened this issue Jan 12, 2023 · 10 comments · May be fixed by googlecodelabs/odml-pathways#28
Closed

Problem with loading example dataset #59245

rexys777 opened this issue Jan 12, 2023 · 10 comments · May be fixed by googlecodelabs/odml-pathways#28
Assignees
Labels
awaiting PR merge awaiting PR merge comp:lite TF Lite related issues type:docs-bug Document issues

Comments

@rexys777
Copy link

rexys777 commented Jan 12, 2023

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

No

Source

source

Tensorflow Version

tf2.9.2

Custom Code

No

OS Platform and Distribution

Google collab

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

Good afternoon!
I learm the colab notebook, example «Transfer Learning for the Audio Domain with TensorFlow Lite Model Maker» (https://www.tensorflow.org/lite/models/modify/model_maker/audio_classification)
In this example, The Birds dataset cannot be loaded. This link (https://storage.googleapis.com/laurencemoroney-blog.appspot.com/birds_dataset .zip') is invalid.
How can I get this dataset for this example?

Standalone code to reproduce the issue

Example text from: https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/models/modify/model_maker/audio_classification.ipynb#scrollTo=upNRfilkNSmr

birds_dataset_folder = tf.keras.utils.get_file('birds_dataset.zip',
                                                'https://storage.googleapis.com/laurencemoroney-blog.appspot.com/birds_dataset.zip',
                                                cache_dir='./',
                                                cache_subdir='dataset',
                                                extract=True)
                                                

This link (https://storage.googleapis.com/laurencemoroney-blog.appspot.com/birds_dataset .zip') is invalid.

Relevant log output

Downloading data from https://storage.googleapis.com/laurencemoroney-blog.appspot.com/birds_dataset.zip
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/keras/utils/data_utils.py in get_file(fname, origin, untar, md5_hash, file_hash, cache_subdir, hash_algorithm, extract, archive_format, cache_dir)
    276       try:
--> 277         urlretrieve(origin, fpath, dl_progress)
    278       except urllib.error.HTTPError as e:

8 frames
HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/keras/utils/data_utils.py in get_file(fname, origin, untar, md5_hash, file_hash, cache_subdir, hash_algorithm, extract, archive_format, cache_dir)
    277         urlretrieve(origin, fpath, dl_progress)
    278       except urllib.error.HTTPError as e:
--> 279         raise Exception(error_msg.format(origin, e.code, e.msg))
    280       except urllib.error.URLError as e:
    281         raise Exception(error_msg.format(origin, e.errno, e.reason))

Exception: URL fetch failure on https://storage.googleapis.com/laurencemoroney-blog.appspot.com/birds_dataset.zip: 404 -- Not Found
@rexys777
Copy link
Author

What should be the data structure for this example?

@synandi synandi added type:docs-bug Document issues and removed type:bug Bug labels Jan 13, 2023
@synandi
Copy link
Contributor

synandi commented Jan 13, 2023

@tilakrayal, I was able to replicate the issue on Colab using TF v2.11. Please find the gist here. Thank you!

@synandi synandi assigned tilakrayal and unassigned synandi Jan 13, 2023
@tilakrayal tilakrayal added the comp:lite TF Lite related issues label Jan 16, 2023
@tilakrayal
Copy link
Contributor

tilakrayal commented Jan 17, 2023

@rexys777,
Thank you for reporting the issue. This is already a known problem and the developer team is attempting to resolve it. We are tracking this internally and will update once it gets resolved. Thank you!

@tilakrayal tilakrayal added the awaiting PR merge awaiting PR merge label Jan 19, 2023
@root50643
Copy link

root50643 commented Jan 26, 2023

What should be the data structure for this example?

I had the same problem and this is my conclusion

.
└── dataset
 └── small_birds_dataset
  ├── test
  │ ├── azaspi1
  │ ├── chcant2
  │ ├── houspa
  │ ├── redcro
  │ └── wbwwre1
  └── train
    ├── azaspi1
    ├── chcant2
    ├── houspa
    ├── redcro
    └── wbwwre1

@aasthavar
Copy link

Maybe use this website to build your own dataset: https://xeno-canto.org/

@bitstobreath
Copy link

bitstobreath commented Mar 4, 2023

Hello @rexys777 @synandi @tilakrayal @root50643 @aasthavar

Problem solved!

Scripts

IPYNB GitHub Example

Below Data Sets May not work due to google drive rate limiting, so use the Scripts above to generate your own data.

Colab Example

Google Drive ~5GB Zip of dataset

All audio files are downloaded from xeno-canto.org as of Mar 4 2023 ~3AM EST
All audio files are in the correct format, 1600hz mono audio wav, (MS 16bit PCM)

If somebody has a better way to host this via google services as this is a google product and service, please do.

Colab really should develop it's own predefined storage backend to save them the download time for example projects.
It's ridiculous they are paying for the bandwidth, it costs end users too, if they have an expensive ISP.

@rexys777
Copy link
Author

rexys777 commented Mar 4, 2023

Thanks @bitstobreath, great job.
Perhaps the issue is now definitely closed.

@rexys777 rexys777 closed this as completed Mar 4, 2023
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@bitstobreath
Copy link

bitstobreath commented Mar 4, 2023

@tilakrayal I have solved that issue, and provided some links, please confirm if this helps you.
Thanks.
Note that the scripts are not exactly a beautiful solution, although the scripts gets the job done for now. I have no time to optimize or organize the code.

@tilakrayal
Copy link
Contributor

@bitstobreath & @rexys777,
Yeah, the dataset was able to load and tested the code & working as expected. Also as mentioned here, the internal CL was also closed and the data in the tensorflow.org will also update soon. Kindly find the gist of it here.
Screenshot 2023-03-04 10 47 46 PM

Thank you!

pjpratik added a commit to pjpratik/odml-pathways that referenced this issue Mar 8, 2023
The dataset url is broken. Updated the url with the active link to download the dataset. [#59245](tensorflow/tensorflow#59245).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting PR merge awaiting PR merge comp:lite TF Lite related issues type:docs-bug Document issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants