-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terminating after "Plotting labels..." when training #5395
Comments
👋 Hello @KristofferK, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com. RequirementsPython>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started: $ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
Public W&B Logging link: https://wandb.ai/kknuds19/train/runs/o18wqty1/overview?workspace=user-kknuds19 |
@KristofferK it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and RequirementsPython>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started: $ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
@glenn-jocher It is a freshly setup Anaconda envrionment, with latest repo and requirements.txt. PyTorch is 1.8.2 (LTS). |
@KristofferK unfortunately we don't have resources to help debug individual environments. If I were you I would create a venv and pip install everything, we don't use conda in our verified environments. |
@KristofferK also for us to begin investigating an issue we need a minimum reproducible example. If we can't reproduce your issue there's no action for us to take. We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem. How to create a Minimal, Reproducible ExampleWhen asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
In addition to the above requirements, for Ultralytics to provide assistance your code should be:
If you believe your problem meets all of the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template and providing a minimum reproducible example to help us better understand and diagnose your problem. Thank you! 😃 |
@glenn-jocher Facing the same issue, when running on a Windows machine with a newly setup environment with all the dependencies installed correctly. It seems like Line 235 in 5d4258f
To confirm, I also executed EDIT: It seems like there was a bug recently introduced in a package called
It's only affecting windows machines. @KristofferK Downgrading |
@MrinalJain17 Thank you so much. That did indeed fix the issue. I hope yolov5 will either wrap the plot_labels in a Try/Except or force the version of the freetype package. I downgraded from 2.11.0 to 2.10.4, and it works again. |
@MrinalJain17 thanks for looking into this! It seems like there is no action for us to take then based upon your conclusions? We can try: except label plotting also, but I'm not sure it's best practices for downstream matplotlib users to all adjust their code for error handling here. |
On MacOS I don't see any freetype package here either. This is what my environment looks like based upon (venv) (base) glennjocher@Glenns-iMac yolov5 % pip list
Package Version
----------------------- ---------------------
absl-py 0.15.0
appnope 0.1.2
backcall 0.2.0
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.7
cycler 0.10.0
decorator 5.1.0
google-auth 2.3.0
google-auth-oauthlib 0.4.6
grpcio 1.41.0
idna 3.3
ipython 7.28.0
jedi 0.18.0
kiwisolver 1.3.2
Markdown 3.3.4
matplotlib 3.4.3
matplotlib-inline 0.1.3
numpy 1.21.3
oauthlib 3.1.1
opencv-python 4.5.4.58
pandas 1.3.4
parso 0.8.2
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.4.0
pip 21.3.1
prompt-toolkit 3.0.21
protobuf 3.19.0
ptyprocess 0.7.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
Pygments 2.10.0
pyparsing 2.4.7
python-dateutil 2.8.2
pytz 2021.3
PyYAML 6.0
requests 2.26.0
requests-oauthlib 1.3.0
rsa 4.7.2
scipy 1.7.1
seaborn 0.11.2
setuptools 57.0.0
six 1.16.0
tensorboard 2.7.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
thop 0.0.31.post2005241907
torch 1.10.0
torchvision 0.11.1
tqdm 4.62.3
traitlets 5.1.0
typing-extensions 3.10.0.2
urllib3 1.26.7
wcwidth 0.2.5
Werkzeug 2.0.2
wheel 0.36.2 |
@glenn-jocher That makes sense. It's windows-specific, and hopefully a temporary issue. However, I believe it would be helpful to have some sort of a "known issues" tracker for the YOLOv5 repository, which would describe any such errors along with some troubleshooting options. Even in the future, if some other third-party library breaks any part of the code, users can find that info (and relevant solutions) in the said tracker. |
@MrinalJain17 yes a known issue tracker is certainly a good idea. We have a TODO list with about 20 items which somewhat handles this currently. We track these these through issue tags: |
@MrinalJain17 seems like another Windows user had the same problem in #5611. I just realized another option besides try except is to use or utils.general.timeout. Maybe something like this: @Timeout(30)
def plot_labels(labels, names=(), save_dir=Path('')):
# plot dataset labels
... |
@MrinalJain17 wait I just noticed a difference. In #5611 the process just hangs at plot_labels(), but you said in your case the process actually terminated by itself? |
* Improve plots.py robustness Addresses issues #5374, #5395, #5611 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@MrinalJain17 @KristofferK good news 😃! Your original issue may now be fixed ✅ in PR #5616. This PR does not fix any underlying issues with matplotlib/freetype, but it does enclose plot_labels() in Lines 327 to 331 in def7a0f
To receive this update:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
@glenn-jocher So, if you notice this section below of the output from #5611 , this is actually what the issue was. Basically, anything remotely close to a matplotlib command ended up killing the entire process. The Moreover, the issue was super-specific: It was for windows-machines using anaconda with the default channel. The good news is that they've yanked |
Hi @KristofferK ! |
Hello WJos. The malaria dataset is not actually what I am working on, rather it was to test out yolov5 before using it on my own dataset of drosophila. For malaria I used https://www.kaggle.com/kmader/malaria-bounding-boxes/ and converted it to yolov5 format. I might still have the code for the converter if you're interested. |
It doesn't work well on Windows, because there is a 'signal.SIGALRM' in class 'Timeout'. It would thourgh a error like "module 'signal' has no attribute 'SIGALRM'. But it work well on Linux. How about remove Timeout(30) but still keep try_except? |
@yeshanliu it appears you may have environment problems. The above code does work well on windows, windows is part of our daily CI testing: https://github.com/ultralytics/yolov5/runs/5562838761?check_suite_focus=true Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.9 environment, clone the latest repo (code changes daily), and 💡 ProTip! Try one of our verified environments below if you are having trouble with your local environment. RequirementsPython>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started: git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install Models and datasets download automatically from the latest YOLOv5 release when first requested. EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
@yeshanliu I investigated some more, it looks like the Windows CI tests are passing because the Try Except decorator is outside the Timeout decorator and is catching the SIGALARM error. So the good news is it works on Windows if you are using current code, the bad news is it works by skipping plotting labels. I think the solution is to put if else statements into Timeout and just put a note that it doesn't work on windows. I'll create a PR. |
That will be so good! And thanks for applying.
… 在 2022年3月16日,21:20,Glenn Jocher ***@***.***> 写道:
@yeshanliu I investigated some more, it looks like the Windows CI tests are passing because the Try Except decorator is outside the Timeout decorator and is catching the SIGALARM error. So the good news is it works on Windows if you are using current code, the bad news is it works by skipping plotting labels. I think the solution is to put if else statements into Timeout and just put a note that it doesn't work on windows. I'll create a PR.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
@KristofferK @MrinalJain17 @WJos @yeshanliu good news 😃! Your original issue may now be fixed ✅ in PR #7013. This PR disables Timout using SIGALARM on Windows. To receive this update:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
i deleted --cache and problem solved |
* Improve plots.py robustness Addresses issues ultralytics#5374, ultralytics#5395, ultralytics#5611 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
I met the same problem,it seems that TryExcept doesnot work.I sure that all thrid-party packages were well installed but it still terminating after "Plotting labels".My OS is Ubuntu 18.04 |
This issue is about "Plotting labels" terminating on Windows platform. The release (7.0 and 6.2) work well on my Ubuntu platform, so I suggest you check your release version and environment |
hi @yeshanliu I'd recommend checking if your release versions are updated and if your environment meets all the necessary requirements. Make sure to use the latest release of the YOLOv5 repository and a complete installation of all required packages. You can refer to the installation instructions in the Ultralytics YOLOv5 documentation for a complete guide on setting up YOLOv5 on your Ubuntu 18.04 system. If the issue still persists, feel free to provide more details about your setup, and we can further investigate the problem. |
While I am able to use YOLOv5 for inference, the train.py does not seem to work for me anymore. It did work previously however.
I have tried to clone the latest repo as well. I have set up a fresh Conda environment with Python 3.8. Again, inference works, but not training my custom data.
It will create the "exp" directory (exp24) in this case. Which contains an empty "weights" directory, hyp.yaml, opt.aml, and events.out.fs.events..0. No .pt, no images, no results.csv.
I have tried both the training set that I previously was able to train with and a new one I just created.
I run it using
python train.py --img 640 --batch 4 --epochs 200 --data C:/Users/kristofferk/Documents/GitHub/p9-api/experiment/kristoffer/step06-data.yaml --weights yolov5s.pt
But when it comes to "Plotting labels..." it will be stuck there for about 20 seconds and then terminate without any further warnings or errors.
The output of running train.py is:
Any suggestions on how to proceed from here? Either to fix it or at least get a more detailed error message.
Thanks in advance.
The text was updated successfully, but these errors were encountered: