Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not export to edgetpu model #8842

Closed
2 tasks done
walterwangimagr opened this issue Aug 3, 2022 · 12 comments · Fixed by #8902
Closed
2 tasks done

Can not export to edgetpu model #8842

walterwangimagr opened this issue Aug 3, 2022 · 12 comments · Fixed by #8902
Labels
bug Something isn't working

Comments

@walterwangimagr
Copy link

walterwangimagr commented Aug 3, 2022

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Export

Bug

I was following the tutorial to train a model and export to edgetpu model.
When I use the coco128.yaml as dataset, it was fine I can train and export
But when I use a custom dataset with only one class, it fail on the step tflite -> edgetpu
I also try with the similar dataset GlobalWheat2020.yaml, same issue. Is there an extra step I need to do or it is a bug?
I use the docker image provided and install tensorflow 2.9.1 for export
Training script
python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --name coco128
Export script
python export.py --img 640 --data data/coco128.yaml --weights runs/train/coco128/weights/best.pt --include edgetpu
And this work fine

Training script
python train.py --img 640 --batch 16 --epochs 3 --data GlobalWheat2020.yaml --weights yolov5s.pt
Export script
python export.py --img 640 --data data/GlobalWheat2020.yaml --weights runs/train/exp19/weights/best.pt --include edgetpu
Run into error
image

Environment

-Docker images provided
-install tensorflow 2.9.1

Minimal Reproducible Example

# start docker 
sudo docker run -it --gpus '"device=1,2,3"' -v `pwd`:/workspace --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ultralytics/yolov5 /bin/bash

# install tensorflow 
pip install tensorflow 

# train 
python train.py --img 640 --batch 16 --epochs 3 --data data/GlobalWheat2020.yaml --weights yolov5s.pt

# export 
python export.py --img 640 --data data/GlobalWheat2020.yaml --weights runs/train/exp17/weights/best.pt --include edgetpu

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@walterwangimagr walterwangimagr added the bug Something isn't working label Aug 3, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2022

👋 Hello @walterwangimagr, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@walterwangimagr thanks for the bug report! I'll try to reproduce using your commands.

@glenn-jocher
Copy link
Member

@walterwangimagr yes I'm able to reproduce. Very strange, not sure what the problem might be.

!python train.py --img 640 --batch 16 --epochs 3 --data GlobalWheat2020.yaml --weights yolov5s.pt
!python export.py --img 640 --weights runs/train/exp2/weights/best.pt --include edgetpu

VOC (20cls) also fails. Edge TPU seems to be failing for all non-80 class datasets. Perhaps 80 is hardcoded somewhere in the conversion process. I'll investigate tomorrow.

!python train.py --img 640 --batch 16 --epochs 3 --data VOC.yaml --weights yolov5s.pt
!python export.py --img 640 --data VOC.yaml --weights runs/train/exp3/weights/best.pt --include edgetpu

@glenn-jocher
Copy link
Member

@zldrobit I might need some advice here. Edge TPU export is failing for non-COCO models. I'm not sure what the cause is. Export works for coco128 but fails for VOC and GlobalWheat2020 trained models (see above).

TFLite int-8 export works correctly for both prior to Edge TPU failure. What do you think?

Screen Shot 2022-08-04 at 2 32 37 AM

@zldrobit
Copy link
Contributor

zldrobit commented Aug 8, 2022

@glenn-jocher I could confirm that using the default setting (edgetpu_compiler -s -o) cannot export an EdgeTPU model for VOC (20 classes) or globalwheat (1 class). I searched through the export architecture with edgetpu_compiler -s -d (as suggested in google-coral/edgetpu#450 (comment)), and the best model I could get has almost 130 ops running on CPU:
image
You could reproduce this result immediately with edgetpu_compiler -s -i "model/tf_detect/Reshape_1,model/tf_detect/Reshape_3,model/tf_conv_33/mul_1,model/tf_conv_51/mul_1" globalwheat-int8.tflite or search the export architecture from scratch with edgetpu_compiler -s -d globalwheat-int8.tflite. The VOC model could be exported to EdgeTPU format in the same way.

@walterwangimagr
Copy link
Author

I had look at some posts on
google-coral/edgetpu#449 and google-coral/edgetpu#405 looks like it could because the resolution limitation. But the weird thing is coco128 is able to use img 640 with 80 classes and any other num of classes will fail. I will try to train some model with lower img size to confirm

@walterwangimagr
Copy link
Author

I had experience another weird behaviour, If I change the class name in the coco128.yaml
image
After training and export to edgetpu.tflite, when I use the edgetpu model to run detect.py, it will still predict as 'person'
image

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 8, 2022

@walterwangimagr class names are embedded as model attributes after training finishes, i.e. model.names.

@zldrobit ok thanks for the results! Do you think we should update Edge TPU export to edgetpu_compiler -s -i or edgetpu_compiler -s -d? Let me know what the best option is and I will create a PR to resolve this.

@glenn-jocher
Copy link
Member

@zldrobit I don't think we can use the -i argument as we don't know the output layer names ahead of time for the different sized models. I tested edgetpu_compiler -s -d with COCO128 and VOC. The COCO128 results are the same (in the same time), the VOC results work while taking a lot longer, but the important thing is they work, so I'll create a PR to add the -d argument to all Edge TPU exports.

@glenn-jocher
Copy link
Member

@walterwangimagr good news 😃! Your original issue may now be fixed ✅ in PR #8902 by adding a --search-delegate argument to Edge TPU model compilation per @zldrobit's solution above.

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

glenn-jocher added a commit that referenced this issue Aug 8, 2022
@glenn-jocher glenn-jocher removed the TODO label Aug 8, 2022
@walterwangimagr
Copy link
Author

Thank you very much

ctjanuhowski pushed a commit to ctjanuhowski/yolov5 that referenced this issue Sep 8, 2022
@glenn-jocher
Copy link
Member

@walterwangimagr you're welcome! If you need further assistance or have other questions, feel free to ask. Happy to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants