Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'python_exit_status' #5913

Closed
2 tasks done
awsaf49 opened this issue Dec 7, 2021 · 26 comments · Fixed by #6041
Closed
2 tasks done

AttributeError: 'NoneType' object has no attribute 'python_exit_status' #5913

awsaf49 opened this issue Dec 7, 2021 · 26 comments · Fixed by #6041
Labels
bug Something isn't working Stale

Comments

@awsaf49
Copy link
Contributor

awsaf49 commented Dec 7, 2021

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training

Bug

After completion of the training, I'm getting this error,

wandb: 
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f8609b77710>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1262, in _shutdown_workers
AttributeError: 'NoneType' object has no attribute 'python_exit_status'
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f8609b77710>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1328, in __del__
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1262, in _shutdown_workers
AttributeError: 'NoneType' object has no attribute 'python_exit_status'

Environment

Kaggle

Minimal Reproducible Example

Notebook Link here

Additional

I think I saw a similar post but it was in Japanese which I couldn't understand hence I'm posting it here in English.

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@awsaf49 awsaf49 added the bug Something isn't working label Dec 7, 2021
@glenn-jocher
Copy link
Member

@AyushExel seems like a wandb issue here.

@awsaf49 can you provide example code that reproduces the same error message for us please?

@awsaf49
Copy link
Contributor Author

awsaf49 commented Dec 9, 2021

@glenn-jocher Here's the notebook1 on Kaggle to reproduce. Notebook is public so you'll be able to simply fork and run to reproduce the issue.

@T1M-CHEN
Copy link

T1M-CHEN commented Dec 9, 2021

@awsaf49 I also meet this problem while training in kaggle, you can use '--workers 0' this parameter to avoid this problem while training temporary, maybe it can help you.

@awsaf49
Copy link
Contributor Author

awsaf49 commented Dec 9, 2021

@T1M-CHEN you are right --workers 0 does work but won't it affect the speed as not full resources are being used?

@AyushExel
Copy link
Contributor

@glenn-jocher this doesn't seem like its related to wandb.

@T1M-CHEN
Copy link

T1M-CHEN commented Dec 9, 2021

@awsaf49
I think there are some problems among v6.0 codes, kaggle and multiprocess module, --workers 0 may increase your training time, but I'm not sure whether this paremeter will affect the acc.

If you don't need to use the advanced features in v6.0, you can use v5.0 codes on kaggle, using parameter --workers 2 to use full resources temporary.

@glenn-jocher
Copy link
Member

@awsaf49 @T1M-CHEN I just updated the Kaggle notebook to the latest, so it's now aligned with the Colab notebook. I see 4 CPUs on Kaggle so you should be able to use up to --workers 4, but regardless YOLOv5 will limit itself to 4 workers rather than the default 8 if the environment only supports 4 workers.

The error may simply due to resource saturation, so yes perhaps reducing --workers to 3 or 2 would help.

@glenn-jocher
Copy link
Member

@awsaf49 @T1M-CHEN strangely the Kaggle notebook is not displaying any LOGGER outputs from YOLOv5, only print() statement outputs. I'm not sure what the problem is, as LOGGER statements appear in all other environments we use (PyCharm, Docker, Colab, GCP, AWS).

@bhachauk
Copy link

bhachauk commented Dec 9, 2021

@T1M-CHEN
Tried with version : v5.0

Traceback (most recent call last):
  File "train.py", line 543, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 87, in train
    ckpt = torch.load(weights, map_location=device)  # load checkpoint
  File "/opt/conda/lib/python3.7/site-packages/torch/serialization.py", line 607, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/conda/lib/python3.7/site-packages/torch/serialization.py", line 882, in _load
    result = unpickler.load()
  File "/opt/conda/lib/python3.7/site-packages/torch/serialization.py", line 875, in find_class
    return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'SPPF' on <module 'models.common' from '/kaggle/working/yolov5/models/common.py'>

and also with v6.0 still can't resolve the same issue... even changing the workers argument as mentioned by @glenn-jocher
AttributeError: 'NoneType' object has no attribute 'python_exit_status'

@T1M-CHEN
Copy link

@glenn-jocher
Yes, I also meet this problem, v6.0 codes clone from git can't display print() statement outputs properly, but using v5.0 codes on kaggle can print normally.

@T1M-CHEN
Copy link

@Bhanuchander210
the quote seems that you are not using the correct weights, maybe you should check out the weights and codes version.
You can try to search author's name on kaggle to use the latest codes, maybe this can help you.

@awsaf49
Copy link
Contributor Author

awsaf49 commented Dec 10, 2021

@awsaf49 @T1M-CHEN I just updated the Kaggle notebook to the latest, so it's now aligned with the Colab notebook. I see 4 CPUs on Kaggle so you should be able to use up to --workers 4, but regardless YOLOv5 will limit itself to 4 workers rather than the default 8 if the environment only supports 4 workers.

The error may simply due to resource saturation, so yes perhaps reducing --workers to 3 or 2 would help.

@glenn-jocher in GPU kaggle has 2 CPU so I tried --workers 1 but still got the same error.

@Tears1997
Copy link

Tears1997 commented Dec 15, 2021

I also meet this problem while training on my lab's server(Ubuntu18.04) and use initial weight file yolov5x.pt. I found that this error does not occur if the --workers parameter is set to 0. However, this problem occurs when the parameter is set to a value other than 0. The code version is v6.0. The GPUs is 4 * RTX 6000 and CPU environment is as follows.
QQ图片20211215133151
The error information is as follows:
QQ图片20211215120930
QQ图片20211215130820
It seems that this error will not affect normal training, but it will be printed every time after training

@glenn-jocher

@glenn-jocher
Copy link
Member

glenn-jocher commented Dec 15, 2021

@Tears1997 👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀. I am not able to reproduce your bug. When I run the default training in our Colab notebook everything works correctly:

Screenshot 2021-12-15 at 13 48 03

We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

  • Minimal – Use as little code as possible to produce the problem
  • Complete – Provide all parts someone else needs to reproduce the problem
  • Reproducible – Test the code you're about to provide to make sure it reproduces the problem

For Ultralytics to provide assistance your code should also be:

  • Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master.
  • Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

@LightDani
Copy link

i faced the same problem on kaggle, but as @glenn-jocher said, on colab it complately works.

@glenn-jocher glenn-jocher linked a pull request Dec 20, 2021 that will close this issue
@glenn-jocher
Copy link
Member

@LightDani @awsaf49 @T1M-CHEN good news 😃! Your original issue may now be fixed ✅ in PR #6041. This PR resets all logging handlers before running any commands, which fixes the Kaggle missing output bug. This does not resolve the original error message reported in this issue.

Screen Shot 2021-12-20 at 5 28 37 PM

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@awsaf49
Copy link
Contributor Author

awsaf49 commented Dec 20, 2021

@glenn-jocher Yes, you are right. now the logger is visible ... :D

@github-actions
Copy link
Contributor

github-actions bot commented Jan 31, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@MheadHero
Copy link

Hi, I faced the same problem again in 2022 after the training finished? What should I do? I am a newbie.

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 25, 2022

@MheadHero 👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀. We've created a few short guidelines below to help users provide what we need in order to start investigating a possible problem.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

  • Minimal – Use as little code as possible to produce the problem
  • Complete – Provide all parts someone else needs to reproduce the problem
  • Reproducible – Test the code you're about to provide to make sure it reproduces the problem

For Ultralytics to provide assistance your code should also be:

  • Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master.
  • Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

@Suozz
Copy link

Suozz commented Apr 7, 2022

I just run as this train.py --epochs 10 --data ./data/test.yaml --cfg models/yolov5s.yaml --weights '' --batch-size 128 --workers 1 --batch-size 10, and meet the same problem.
Note: I used the master branch code, and do not change any code

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 7, 2022

@Suozz you've passed --batch-size twice in your command. In any case your example is not reproducible example as no errors occur when I run this in Colab with COCO128.

We've created a few short guidelines below to help users provide what we need in order to start investigating a possible problem.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

  • Minimal – Use as little code as possible to produce the problem
  • Complete – Provide all parts someone else needs to reproduce the problem
  • Reproducible – Test the code you're about to provide to make sure it reproduces the problem

For Ultralytics to provide assistance your code should also be:

  • Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master.
  • Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

@haoshifu
Copy link

haoshifu commented Sep 1, 2022

Hello, has the problem been solved?

@glenn-jocher
Copy link
Member

@haoshifu update your torch to the latest

@shinianzhihou
Copy link

ERROR:(....__del__.... \n AttributeError: 'NoneType' object has no attribute 'python exit_status
Actually, it is a bug in PYTORCH. I have checked the source code about ‘torch.utils.data.dataloader._shutdown_workers’ and find the difference between torch1.7-torch1.12 and torch1.13 lies on:

# torch1.13, nice baby
if _utils is None or _utils.python_exit_status is True or _utils.python_exit_status is None: return
# torch1.7-1.12, bad guy
python_exit_status = _utils.python_exit_status
if python_exit_status is True or python_exit_status is None: return

So, the simple solution is that modify the source code from bad guy to nice baby.

@glenn-jocher
Copy link
Member

@shinianzhihou Thank you for sharing your findings! It seems like you have identified a potential solution to the issue based on the differences you observed in the PyTorch source code.

You're welcome to create a pull request with your proposed modification to the YOLOv5 repository. Your contribution would be greatly appreciated by the community. This will allow the Ultralytics team to review your changes and consider incorporating them into the YOLOv5 codebase.

Thank you for taking the initiative to investigate this issue and suggesting a potential solution! If you have any further questions or need assistance with the pull request process, feel free to ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.