Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load model with torch.hub #11769

Closed
1 of 2 tasks
NatanelBirarov opened this issue Jun 25, 2023 · 6 comments
Closed
1 of 2 tasks

Cannot load model with torch.hub #11769

NatanelBirarov opened this issue Jun 25, 2023 · 6 comments
Labels
bug Something isn't working Stale

Comments

@NatanelBirarov
Copy link

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

PyTorch Hub

Bug

I am loading a custom YOLOv5 model with torch.hub with code that I wrote a long time ago. I need now to run it from inside a Docker container, running on a Jetson Xavier Nx. So I took an NVIDIA pytorch container that was already working for me on a Jetson Nano, but it throw the following error:

  File "/root/.cache/torch/hub/ultralytics_yolov5_master/utils/general.py", line 248, in check_requirements
    pkg.require(r)
  File "/usr/local/lib/python3.8/dist-packages/pkg_resources/__init__.py", line 968, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/local/lib/python3.8/dist-packages/pkg_resources/__init__.py", line 829, in resolve
    dist = self._resolve_dist(
  File "/usr/local/lib/python3.8/dist-packages/pkg_resources/__init__.py", line 854, in _resolve_dist
    if dist is None or (dist not in req and replace_conflicting):
  File "/usr/local/lib/python3.8/dist-packages/pkg_resources/__init__.py", line 3205, in __contains__
    return self.specifier.contains(item, prereleases=True)
  File "/usr/local/lib/python3.8/dist-packages/pkg_resources/_vendor/packaging/specifiers.py", line 905, in contains
    item = Version(item)
  File "/usr/local/lib/python3.8/dist-packages/pkg_resources/_vendor/packaging/version.py", line 198, in __init__
    raise InvalidVersion(f"Invalid version: '{version}'")
pkg_resources.extern.packaging.version.InvalidVersion: Invalid version: '2.0.0.nv23.05'

I first tried previous versions of ultralytics, but I got the error no module named 'ultralytics.yolo'
Then I remembered that I previously used a version of the repository that didn't have that module yet, so I checked-out earlier versions (<=v7.0), still I got the same error as above, InvalidVersion: '2.0.0.nv23.05'
I don't know what else to try, Can someone help me figure this out please?

Thank you in advance.

Environment

YOLOv5 🚀 v6.0-0-g956be8e torch 2.0.0.nv23.05 CUDA:0 (Xavier, 6848.73046875MB)
Ubuntu 20.04 ,docker container https://hub.docker.com/layers/dustynv/ros/humble-pytorch-l4t-r35.3.1/images/sha256-b1ec6f8b67f25bcf0ebb124555c058fe97c2cc03aba2fc299c2ce3296f876a55?context=explore
Python 3.8.10

Minimal Reproducible Example

import torch

self.model = torch.hub.load('ultralytics/yolov5', 'yolov5n')  # I'm loading a custom model in my code but the error is the same

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@NatanelBirarov NatanelBirarov added the bug Something isn't working label Jun 25, 2023
@glenn-jocher
Copy link
Member

@NatanelBirarov hi there,

It seems like you are encountering an error when trying to load a custom YOLOv5 model using torch.hub. The error message indicates an "InvalidVersion" and mentions "2.0.0.nv23.05".

From the information you provided, it seems that you are using YOLOv5 v6.0-0-g956be8e and torch 2.0.0.nv23.05 in a Docker container running on a Jetson Xavier Nx. It looks like you have already tried using different versions of the "ultralytics" package and checked out earlier versions of the repository, but the error persists.

To better assist you with this issue, could you provide some additional information? It would be helpful to know the exact steps you followed and any modifications you made to the code or environment. Additionally, please confirm if you are facing this issue only in the Docker container on the Jetson Xavier Nx or if it occurs in other environments as well.

Meanwhile, I recommend checking the YOLOv5 issues to see if anyone else has reported a similar bug. It's possible that the community or the Ultralytics team has encountered this before and might have provided a solution.

Thank you for bringing this to our attention, and we appreciate your willingness to submit a PR. Your contribution would be valuable in improving YOLOv5 for the entire community.

Best regards,
Glenn

@NatanelBirarov
Copy link
Author

The exact steps I did are as follows:

  1. (About a year ago) Trained a custom model with YOLOv5, wrote a program to run inference on a Jetson Xavier NX, the model is loaded with torch.hub.load. Everything is working fine.
  2. (A few days ago) NX stopped working, reflashed it and decided to reinstall everything inside a Docker container. Took this container https://hub.docker.com/layers/dustynv/ros/humble-pytorch-l4t-r35.3.1/images/sha256-b1ec6f8b67f25bcf0ebb124555c058fe97c2cc03aba2fc299c2ce3296f876a55?context=explore , ran the same exact code on it, code throws the first error from above: pkg_resources.extern.packaging.version.InvalidVersion: Invalid version: '2.0.0.nv23.05'
  3. Tried to downgrade ultralytics version, but I got the error no module named 'ultralytics.yolo'
  4. Tried to clone a previous version on the YOLOv5 repository into the cache of Pytorch hub, still gave the same error as before, InvalidVersion
  5. Tried all the above steps outside the container, same result.
  6. Tried on a regular PC, with Pytorch 2.0.1, everything worked, which leads me to believe it's an issue with the Jetson version of Pytorch.

@glenn-jocher
Copy link
Member

@NatanelBirarov hello,

Thank you for providing detailed steps. I apologize for the inconvenience you are facing while loading the custom YOLOv5 model using torch.hub in the Docker container on the Jetson Xavier NX.

Based on your description, it seems that the issue might be related to the version of PyTorch installed on the Jetson Xavier NX. When you ran the same code on a regular PC with PyTorch 2.0.1, it worked without any problems. This suggests that the issue might be specific to the Jetson version of PyTorch.

To further investigate and resolve this, I would recommend reaching out to the PyTorch or Jetson communities for assistance. They may have more insights or suggestions on how to resolve the issue with the version incompatibility. Additionally, you can check their documentation or support channels to see if there are any known compatibility issues between PyTorch and the Jetson Xavier NX.

I hope this helps, and I appreciate your patience and understanding. We are here to support you, and if you have any further questions or need more assistance, please feel free to ask.

Best regards,
Glenn

@goshlanguage
Copy link

I encountered this same issue on an Nvidia Jetson Xavier NX (albeit when trying out fastchat)

I was able to resolve this issue by with the following steps:

  • pip3 install --upgrade requests packaging click torch
  • retry on errors
  • profit

Are you able to resolve the matter this way as well?

Here's some excerpts from my terminal from this issue:

# Original Issue
ryan@nvidia-xnx-00:~$ python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0
<truncated>
  File "/home/ryan/.local/lib/python3.8/site-packages/transformers/utils/generic.py", line 29, in <module>
    from .import_utils import is_flax_available, is_tf_available, is_torch_available, is_torch_fx_proxy
  File "/home/ryan/.local/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 400, in <module>
    _torch_fx_available = (torch_version.major, torch_version.minor) >= (
AttributeError: 'LegacyVersion' object has no attribute 'major'

# Searched error online and found a suggestion to upgrade requests:
pip3 install --upgrade requests
Collecting requests
<truncated>
ERROR: wandb 0.15.5 has requirement Click!=8.0.0,>=7.1, but you'll have click 7.0 which is incompatible.
ERROR: peft 0.3.0 has requirement torch>=1.13.0, but you'll have torch 2.0.0.nv23.05 which is incompatible.
ERROR: huggingface-hub 0.16.4 has requirement packaging>=20.9, but you'll have packaging 20.3 which is incompatible.
Installing collected packages: requests
Successfully installed requests-2.31.0

# Noticed errors upgrading requests
pip3 install --upgrade click torch packaging
Collecting click
  Using cached click-8.1.5-py3-none-any.whl (98 kB)
Collecting torch
  Downloading torch-2.0.1-cp38-cp38-manylinux2014_aarch64.whl (74.0 MB)
     |████████████████████████████████| 74.0 MB 5.5 kB/s
Collecting packaging
  Downloading packaging-23.1-py3-none-any.whl (48 kB)
     |████████████████████████████████| 48 kB 239 kB/s
ERROR: peft 0.3.0 has requirement torch>=1.13.0, but you'll have torch 2.0.0.nv23.05 which is incompatible.
ERROR: accelerate 0.21.0 has requirement torch>=1.10.0, but you'll have torch 2.0.0.nv23.05 which is incompatible.

# This last error resulted in a still broken invocation (I'm using a tutorial using fastchat but the python errors seem to be the same)
python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0
    raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '2.0.0.nv23.05'

# try one more time to upgrade packages without errors
pip3 install --upgrade click torch
Collecting click
  Using cached click-8.1.5-py3-none-any.whl (98 kB)
Collecting torch
  Using cached torch-2.0.1-cp38-cp38-manylinux2014_aarch64.whl (74.0 MB)
Requirement already satisfied, skipping upgrade: networkx in /usr/local/lib/python3.8/dist-packages (from torch) (3.1)
Requirement already satisfied, skipping upgrade: typing-extensions in ./.local/lib/python3.8/site-packages (from torch) (4.7.1)
Requirement already satisfied, skipping upgrade: jinja2 in /usr/local/lib/python3.8/dist-packages (from torch) (3.1.2)
Requirement already satisfied, skipping upgrade: sympy in /usr/local/lib/python3.8/dist-packages (from torch) (1.12)
Requirement already satisfied, skipping upgrade: filelock in /usr/local/lib/python3.8/dist-packages (from torch) (3.12.2)
Requirement already satisfied, skipping upgrade: MarkupSafe>=2.0 in ./.local/lib/python3.8/site-packages (from jinja2->torch) (2.1.3)
Requirement already satisfied, skipping upgrade: mpmath>=0.19 in /usr/local/lib/python3.8/dist-packages (from sympy->torch) (1.3.0)
Installing collected packages: click, torch
Successfully installed click-8.1.5 torch-2.0.1

# Looks like its resolved
python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0
Downloading spiece.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 2.64MB/s]
Downloading (…)in/added_tokens.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:00<00:00, 19.7kB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 1.41MB/s]

@glenn-jocher
Copy link
Member

@goshlanguage i encountered a similar issue on an Nvidia Jetson Xavier NX when using fastchat. I was able to resolve the issue with the following steps:

  1. Upgrade the requests package using pip3 install --upgrade requests.
  2. Upgrade the click, torch, and packaging packages using pip3 install --upgrade click torch packaging.

After following these steps, the issue was resolved, and I was able to run the code without any errors.

You can try these steps and see if they resolve the issue for you as well. Let me know if you need any further assistance.

@github-actions
Copy link
Contributor

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Aug 15, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

3 participants