-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v6.2 is not torch.jit.trace-able #9341
Comments
👋 Hello @paaksing, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com. RequirementsPython>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started: git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit. |
This indirectly affects compilation of artifacts to aws neuron runtime ( |
@paaksing 👋 Hello! Thanks for asking about Export Formats. YOLOv5 🚀 offers export to most popular formats used today. See our TFLite, ONNX, CoreML, TensorRT Export Tutorial for details. FormatsYOLOv5 inference is officially supported in 11 formats: 💡 ProTip: TensorRT may be up to 2-5X faster than PyTorch on GPU benchmarks
CPU Benchmarks on Colab Pro+ CPU instanceFull CPU benchmarks
GPU Benchmarks on Colab Pro+ V100 instanceFull GPU benchmarks
Good luck 🍀 and let us know if you have any other questions! |
@paaksing can you provide a reproducible successful command with v6.1 please? This should help us to understand the issue, as the above are not part of our CI or standard use cases. |
!pip install torch==1.8.1 torchvision==0.9.1
import torch
import requests
with open("yolov5s61.pt", "wb+") as f:
f.write(requests.get("https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt").content)
model = torch.hub.load('ultralytics/yolov5:v6.1', 'custom', path="yolov5s61.pt", force_reload=True)
try:
torch.jit.trace(model, [torch.zeros([1, 3, 640, 640])])
except Exception:
torch.jit.trace(model, [torch.zeros([1, 3, 640, 640])]) This works, and is very similar with |
@paaksing yes thanks! The Try Except appears redundant, perhaps it exists to catch model download issues. In any case the v6.1 and v6.2 detection models are exactly identical files, same hash, same weights, they have not been retrained, only classification models have been added. When we export torchscript models in export.py we also run torch.jit.trace and this export works correctly (tested every 24 hours in CI tests): Lines 112 to 126 in 4e8504a
|
@glenn-jocher I tried to get as close as that script, but the same error keeps coming, maybe is the way the model is loaded using |
@glenn-jocher |
Also, it appears that the first trace call will always raise an exception, except the second one |
@paaksing ok I figured this out. v6.2 passes the test, but master does not, so a change between v6.2 and now has caused this. You can use git bisect to track this down by passing in the exact commit hash, i.e. here from July 30th this commit passes. Can you help test commits until you find the first that fails? All commits at https://github.com/ultralytics/yolov5/commits/master !pip install torch==1.8.1 torchvision==0.9.1
import torch
import requests
model = torch.hub.load('ultralytics/yolov5:1e89807d9a208727e3f0e9bf26a1e286d0ce416b', 'custom', path="https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt", force_reload=True, autoshape=False)
model.cpu()
try:
torch.jit.trace(model, [torch.zeros([1, 3, 640, 640])])
except Exception:
torch.jit.trace(model, [torch.zeros([1, 3, 640, 640])]) |
@glenn-jocher sure |
@glenn-jocher found it, 7aa263c |
@paaksing oh that was fast. Yes that makes sense, that changed DetectMultiBackend behavior. Ok I'll to find a fix, and I'll also try to add this workflow to the CI to safeguard the workflow in the future. |
@glenn-jocher One by one is slow, so I went the binary search method. Thanks, I'll wait for news |
@paaksing ok #9363 appears to be working. As long as you leave I'm going to add some CI and then merge. |
@paaksing here's the test script that traces v6.2 models with latest torch and PR (cleaned up and using warmup rather than Try Except: import torch
import requests
model = torch.hub.load('ultralytics/yolov5:update/torch', 'yolov5s', force_reload=True, skip_validation=True)
model.cpu()
im = torch.zeros([1, 3, 640, 640])
model(im) # warmup, build grids
torch.jit.trace(model, [im]) EDIT: note |
@glenn-jocher Thanks a lot, It's working and now the script looks cleaner as well, closing this. |
@paaksing good news 😃! Your original issue may now be fixed ✅ in PR #9363. This PR also adds torch.jit.trace() CI to protect from tracing issues arising in the future. To receive this update:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
Search before asking
YOLOv5 Component
Export
Bug
Environment
YOLOv5 🚀 2022-9-9 Python-3.7.13 torch-1.12.1+cu113 CUDA:0 (Tesla P100-PCIE-16GB, 16281MiB)
Minimal Reproducible Example
Additional
v6.1 works fine, but breaks after updating to v6.2
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: