-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set' #58
Comments
Hello @e-shawakri, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com. |
The same issue. I built PyTorch from source(master branch, version-1.6.0) because I got a CUDA-10.0 environment. python train.py --img 640 --batch 16 --epochs 10 --data ./data/coco128.yaml --cfg ./models/yolov5s.yaml --weights 'weights/yolov5s.pt' I got the error as follows /home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.container.Sequential' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.batchnorm.BatchNorm2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.activation.LeakyReLU' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.pooling.MaxPool2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.upsampling.Upsample' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
File "train.py", line 398, in <module>
train(hyp)
File "train.py", line 117, in train
{k: v for k, v in ckpt['model'].state_dict().items() if model.state_dict()[k].numel() == v.numel()}
File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 783, in state_dict
module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 783, in state_dict
module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 783, in state_dict
module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
[Previous line repeated 1 more time]
File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 780, in state_dict
self._save_to_state_dict(destination, prefix, keep_vars)
File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 743, in _save_to_state_dict
if buf is not None and name not in self._non_persistent_buffers_set:
File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 655, in __getattr__
type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set' EnvironmentCentOS-7 |
I don't think is specific to this repo, I would raise this over on the apex repo. You can also try one of our working environment: Reproduce Our EnvironmentTo access an up-to-date working environment (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled), consider a:
|
@glenn-jocher I don't think has anything to do with NVIDIA APEX since I haven't installed it and I'm facing the same error. It just happens when I use the |
i was using torch-1.6.x (from git source), deleted it and installed torch-1.5.1 that solved. |
This is a pytorch 1.6 problem. I'm seeing it also when using the official 1.6 today. |
same issue at torch 1.6.0
|
After I reinstalled torch==1.5.0, the issue was Gone.
|
I have the same problem with torch=1.6.0 File "detect.py", line 23, in detect I do not want to reinstall torch==1.5 , I do not find |
This repo is fully pytorch 1.6 compatible now. If you are having problems update your code and your models, as both have been updated for 1.6 compatibility. To update your models simply delete any official pretrained weights you have and download the latest. If you have 1.5 trained custom models they will not work with 1.6. |
The issue is raised in pytorch/pytorch#42242 but this may be prevented by saving the model state dict only. Is there any reason to save the entire model as the checkpoint since those pytorch modules are subject to change? BTW, is it possible to distribute the checkpoints with versioning using the github release? |
@farleylai it's possible to do anything under the sun but with only 24 hours in a day I have to prioritize my time, so no unfortunately there won't be retroactive support for legacy versions, at least not from our end. If you'd like to take some action and submit a PR feel free to. If anyone is having any problems with pytorch 1.6 or v2.0 of this repo, simply start from a clean slate, reclone the repo, autodownload any models you require and everything will work correctly. |
@glenn-jocher |
@farleylai ah interesting. I was under the impression github would not host large files for free, and naturally we don't want to include weights in the main repo, as then every git clone would be far too large. I have not investigated this in a while though, perhaps things have changed? Right now we host weights in Gdrive for most worldwide users, with a backup GCP bucket for China mainland users. attempt_download() will try one if the other fails, and also adds redundancy for everyone else outside China on occasional single-source failure. If you'd like to take the lead on this though that would be awesome! At the moment I'm 110% in over my head simply doing research on model updates and maintaining origin/master in working order. With respect to loading from state_dict, we use this approach in https://github.com/ultralytics/yolov3, but abandoned this. We required users to submit a pair of arguments for loading, a model cfg and a model *.pt file with state_dict weights. Too often users would mismatch the two, receive error messages and then raise bug reports, wasting everyone's time. Our new strategy is to emphasize design which makes things harder for users to break, because if there's a way they will find it. This means less settings, less knobs to turn, less arguments to submit when running things, etc. |
@farleylai about the v1.0 weights themselves, I have copies of these I can send you. If you can replace attempt_download() functionality with the builtin torch functionality that would be great. The main requirements are:
I think that's it. Versioning control as you mention would be nice too of course. We don't currently include any version information in the weights themselves to keep things simple, but this does create confusion as you can see. |
The git clone should be separate from the git repo to make sense: It would be great if each time the released checkpoints can be associated with some git tag. Regarding the model/state_dict matching, perhaps the version or some git hash tag including the config can be inserted into the checkpoint or even the state_dict as a definitive proof? Otherwise, external breaking from PyTorch may still happen in the future. As for the PyTorch hub download APIs based on Other usability that can be enhanced could be the package organization and distribution that you may consider. Torch Hub APIs support exposing other entry points that would be useful for calling training/detection APIs programmatically other than just loading the model but exposing compiled cuda modules if any is unlikely since it merely downloads and extract the sources in the repo. Then a recipe to build a versioned conda distribution as PyTorch is likely necessary to help manage the dependencies and command line usage for training/detection/testing/etc. Perhaps, the top level package should be reorganized to something like BTW, since |
@farleylai thanks buddy. Ok I think I understand better, we are really talking about two different things:
We are already doing the latter, but you are recommending we migrate to a different method. We have the weights already hosted at static URI in GCP buckets, so a transition to S3 would not gain us anything. The mnain problem I see from your explanation are the costs. Our current solution does not incur storage or egress charges for most users, which is a must as our download volumes are in excess of what we want to (are able to) support out of our own pockets in the long term. For the former you're saying we should start doing this. Can you add files to releases retroactively or is this something you're saying we should try to incorporate into v3.0? Yes, AMP is great, it works very well. We have a PR open for this, I'm simply waiting on Google Colab to update their environment, as the change breaks 1.5.1 compatibility (the default colab pytorch currently). With the AMP release we'll update the requirements.txt to torch>=1.6. |
@glenn-jocher Perhaps, I did not make it clear. S3 is just for example that we do the versioning of the official checkpoints and our fine-tuned ones because so far the code simply downloads the latest one through the same link without an option to specify a tag or something. I am just proposing to add this option to the download API for ease of switching between different checkpoint versions. In that sense, the API may be enhanced to accept full download specs for supported cloud storage as extensions, not limited to GCP, S3 and so on but something the user can manage or host itself reliably. |
I think I've added the v1.0 and v2.0 models successfully now to the release files :) |
i just update pytorch from 1.5.1 to 1.6.0 and this error comes. |
@glenn-jocher when I trained my model in yolov5-v3.0, giving error is follow : My envs is follow:
|
@shliang0603 it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and RequirementsPython 3.8 or later with all requirements.txt dependencies installed, including $ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu. |
@glenn-jocher @sailfish009 Can you tell me your cuda version? My cuda is 10.2 and yolov5 V3.0, when I instll torch1.5.1 giving the error : |
@shliang0603 _non_persistent_buffers_set error is well documented in the issues here. This error occurs when trying to load a pytorch 1.5.1 trained model with pytorch 1.6. 1.5.1 is no longer supported. cuda version varies by hardware, we use google colab and google cloud deep learning vms with 11.0 now. No issues on either. |
1、First of all, let me explain my environment
2、Training command
3、Problem analysis I have solved the problem I think this should not be pytorch1.6.0 version of the bug, I encountered this error, because I am using yolov5 v3.0 is used in the training model of yolov5l. pt is downloaded from yolov5 V1.0. Because in the process of the training model yolov5l. pt download is slow and I'm lazy, so I took yolov5 V1.0 download yolov5l. pt copied to yolov5 v3.0, so there is an error: : 4、Solution to error problem
1、首先说明一下我的环境
2、训练命令
3、问题原因分析 我已经解决了这个问题。我认为这应该不是pytorch1.6.0版本的bug,我之所以遇到这个错误是因为,我在用yolov5 v3.0 中使用的预训练模型yolov5l.pt是yolov5 V1.0的中下载的,因为预训练的模型yolov5l.pt下载的比较慢,而我又比较懒,所以我把yolov5 v1.0中下载的yolov5l.pt拷贝到yolov5 v3.0中,因此出现错误: 4、解决方法 直接在yolov5 v3.0中重新下载一下yolov5l.pt的预训练模型 |
@glenn-jocher Thanks for your reply, I have solved the problem. |
@shliang0603 in principle you can simply add a for k, m in model.named_modules():
m._non_persistent_buffers_set = set() # pytorch 1.6.0 compatability |
@glenn-jocher OK,Thanks. |
Hi,
|
OK got it, if you do not want to use weights you have to type --weights '' otherwise it will load yolov5s.pt HTH |
I have this issue when using pytorch 1.7 built from source. any ideas on how to workaround if I need it to work with pytorch 1.7? Using python 3.7, 32 bit arm |
@bhaktatejas922 this will occur when using older models, i.e. trained with the v2.0 YOLOv5 release or torch<1.6. Train and export with the latest master: |
I have same issue, I'm using recommended environment, updated version, but still have same issue
I think it's not the Pytorch bug, but it's the pre-trained weights not compatible with the train model. anyway, problem sort, I can run the training. Thanks |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple --default-timeout=1000 |
Bug After Install NVIDIA APEX
🐛 Bug
After I install NVIDIA APEX i got this error:
Traceback (most recent call last): File "train.py", line 397, in <module> train(hyp) File "train.py", line 116, in train {k: v for k, v in ckpt['model'].state_dict().items() if model.state_dict()[k].numel() == v.numel()} File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 735, in state_dict module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars) File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 735, in state_dict module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars) File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 735, in state_dict module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars) [Previous line repeated 1 more time] File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 732, in state_dict self._save_to_state_dict(destination, prefix, keep_vars) File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 709, in _save_to_state_dict if buf is not None and name not in self._non_persistent_buffers_set: File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 621, in __getattr__ type(self).__name__, name)) torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set'
But before APEX the code was running smoothly, im using pytorch: 1.6.0.dev20200611
I've tried 1.4 and 1.5 but not working at all
Environment
The text was updated successfully, but these errors were encountered: