Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set' #58

Closed
e-shawakri opened this issue Jun 14, 2020 · 35 comments
Labels
bug Something isn't working Stale

Comments

@e-shawakri
Copy link

Bug After Install NVIDIA APEX

🐛 Bug

After I install NVIDIA APEX i got this error:

Traceback (most recent call last): File "train.py", line 397, in <module> train(hyp) File "train.py", line 116, in train {k: v for k, v in ckpt['model'].state_dict().items() if model.state_dict()[k].numel() == v.numel()} File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 735, in state_dict module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars) File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 735, in state_dict module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars) File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 735, in state_dict module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars) [Previous line repeated 1 more time] File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 732, in state_dict self._save_to_state_dict(destination, prefix, keep_vars) File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 709, in _save_to_state_dict if buf is not None and name not in self._non_persistent_buffers_set: File "/home/hitham/anaconda3/envs/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 621, in __getattr__ type(self).__name__, name)) torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set'

But before APEX the code was running smoothly, im using pytorch: 1.6.0.dev20200611
I've tried 1.4 and 1.5 but not working at all

Environment

  • OS: Ubuntu
  • GPU 1650 Mobile
  • CUDA: 10.1
@e-shawakri e-shawakri added the bug Something isn't working label Jun 14, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Jun 14, 2020

Hello @e-shawakri, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@zhangpiu
Copy link

The same issue. I built PyTorch from source(master branch, version-1.6.0) because I got a CUDA-10.0 environment.
When I try to train from a pretrained checkpoint by

python train.py --img 640 --batch 16 --epochs 10 --data ./data/coco128.yaml --cfg ./models/yolov5s.yaml --weights 'weights/yolov5s.pt'

I got the error as follows

/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.container.Sequential' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.batchnorm.BatchNorm2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.activation.LeakyReLU' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.pooling.MaxPool2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/serialization.py:646: SourceChangeWarning: source code of class 'torch.nn.modules.upsampling.Upsample' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
  File "train.py", line 398, in <module>
    train(hyp)
  File "train.py", line 117, in train
    {k: v for k, v in ckpt['model'].state_dict().items() if model.state_dict()[k].numel() == v.numel()}
  File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 783, in state_dict
    module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
  File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 783, in state_dict
    module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
  File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 783, in state_dict
    module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
  [Previous line repeated 1 more time]
  File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 780, in state_dict
    self._save_to_state_dict(destination, prefix, keep_vars)
  File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 743, in _save_to_state_dict
    if buf is not None and name not in self._non_persistent_buffers_set:
  File "/home/xxx/anaconda3/envs/pytorch1.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 655, in __getattr__
    type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set'

Environment

CentOS-7
CUDA-10.0
GPU-T4

@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 14, 2020

I don't think is specific to this repo, I would raise this over on the apex repo. You can also try one of our working environment:

Reproduce Our Environment

To access an up-to-date working environment (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled), consider a:

@edurenye
Copy link
Contributor

edurenye commented Jul 1, 2020

@glenn-jocher I don't think has anything to do with NVIDIA APEX since I haven't installed it and I'm facing the same error.
Using nvidia driver 450 and CUDA 10.2

It just happens when I use the --weights parameter.

@sailfish009
Copy link

i was using torch-1.6.x (from git source), deleted it and installed torch-1.5.1 that solved.

@glenn-jocher
Copy link
Member

This is a pytorch 1.6 problem. I'm seeing it also when using the official 1.6 today.

@Frank1126lin
Copy link

same issue at torch 1.6.0
with Ubuntu20.04

 File "/media/frank/LinDB/PyProgram/ai/yolov5/yolov5-master0708/predict.py", line 39, in predict
    model = attempt_load(weights, map_location=device)  # load FP32 model
  File "/media/frank/LinDB/PyProgram/ai/yolov5/yolov5-master0708/models/experimental.py", line 130, in attempt_load
    model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval())  # load FP32 model
  File "/media/frank/LinDB/PyProgram/ai/yolov5/yolov5-master0708/models/yolo.py", line 148, in fuse
    m.conv = torch_utils.fuse_conv_and_bn(m.conv, m.bn)  # update conv
  File "/home/frank/miniconda3/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 802, in __setattr__
    remove_from(self.__dict__, self._parameters, self._buffers, self._non_persistent_buffers_set)
  File "/home/frank/miniconda3/envs/ai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 772, in __getattr__
    type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'Conv' object has no attribute '_non_persistent_buffers_set'

@Frank1126lin
Copy link

After I reinstalled torch==1.5.0, the issue was Gone.
Only this, but I got the result.

SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)

@gyhd
Copy link

gyhd commented Jul 29, 2020

I have the same problem with torch=1.6.0

File "detect.py", line 23, in detect
model = attempt_load(weights, map_location=device) # load FP32 model
File "/home/gyhd/Desktop/yolov5/models/experimental.py", line 133, in attempt_load
model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval()) # load FP32 model
File "/home/gyhd/Desktop/yolov5/models/yolo.py", line 151, in fuse
m.conv = torch_utils.fuse_conv_and_bn(m.conv, m.bn) # update conv
File "/home/gyhd/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 802, in setattr
remove_from(self.dict, self._parameters, self._buffers, self._non_persistent_buffers_set)
File "/home/gyhd/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 772, in getattr
type(self).name, name))
torch.nn.modules.module.ModuleAttributeError: 'Conv' object has no attribute '_non_persistent_buffers_set'

I do not want to reinstall torch==1.5 , I do not find torch.nn.Module.dump_patches = True in torch/nn/modules/module.py ,
Could you tell me how should I retrieve the original source code to solve the problem clearly , thanks very much.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 29, 2020

This repo is fully pytorch 1.6 compatible now. If you are having problems update your code and your models, as both have been updated for 1.6 compatibility.

To update your models simply delete any official pretrained weights you have and download the latest. If you have 1.5 trained custom models they will not work with 1.6.

@farleylai
Copy link
Contributor

farleylai commented Jul 30, 2020

The issue is raised in pytorch/pytorch#42242 but this may be prevented by saving the model state dict only. Is there any reason to save the entire model as the checkpoint since those pytorch modules are subject to change?

BTW, is it possible to distribute the checkpoints with versioning using the github release?
It seems like the checkpoint format is also changed in pytorch-1.6 as raised in pytorch/pytorch#42239 being a zip.

@glenn-jocher
Copy link
Member

@farleylai it's possible to do anything under the sun but with only 24 hours in a day I have to prioritize my time, so no unfortunately there won't be retroactive support for legacy versions, at least not from our end. If you'd like to take some action and submit a PR feel free to.

If anyone is having any problems with pytorch 1.6 or v2.0 of this repo, simply start from a clean slate, reclone the repo, autodownload any models you require and everything will work correctly.

@farleylai
Copy link
Contributor

@glenn-jocher
GitHub release can only be initiated on your side that supports uploading binary artifacts with versioning naturally, each of which can be up to 2GB. Moreover, the download would be more straightforward than GDrive since built-in APIs such as torch.hub.download_url_to_file() and torch.hub.load_state_dict_from_url() suffice. If you mean to rewrite APIs like attempt_download() by version/tags and load/save the state_dict to/from the checkpoint instead the entire model, I can manage to submit a PR and believe it should benefit in the long term.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 30, 2020

@farleylai ah interesting. I was under the impression github would not host large files for free, and naturally we don't want to include weights in the main repo, as then every git clone would be far too large. I have not investigated this in a while though, perhaps things have changed?

Right now we host weights in Gdrive for most worldwide users, with a backup GCP bucket for China mainland users. attempt_download() will try one if the other fails, and also adds redundancy for everyone else outside China on occasional single-source failure.

If you'd like to take the lead on this though that would be awesome! At the moment I'm 110% in over my head simply doing research on model updates and maintaining origin/master in working order.

With respect to loading from state_dict, we use this approach in https://github.com/ultralytics/yolov3, but abandoned this. We required users to submit a pair of arguments for loading, a model cfg and a model *.pt file with state_dict weights. Too often users would mismatch the two, receive error messages and then raise bug reports, wasting everyone's time. Our new strategy is to emphasize design which makes things harder for users to break, because if there's a way they will find it. This means less settings, less knobs to turn, less arguments to submit when running things, etc.

@glenn-jocher
Copy link
Member

@farleylai about the v1.0 weights themselves, I have copies of these I can send you. If you can replace attempt_download() functionality with the builtin torch functionality that would be great. The main requirements are:

  • Free model hosting (S3 and GCP charges can add rapidly for a large user base).
  • Download redundancy (i.e. the way gsutil has built-in standoff and retries in case of broken connections). We are accomplishing this by first attempting a gdrive download, and then directing to GCP weights following a gdrive failure. We use curl for both though, gsutil would be better for the second but require additional dependencies. If pytorch has some redundancy measures built in then perfect.
  • Accessibility in China. Gdrive weights are not accessible in China, but GCP is. This is a must as our Chinese user base is large.

I think that's it. Versioning control as you mention would be nice too of course. We don't currently include any version information in the weights themselves to keep things simple, but this does create confusion as you can see.

@farleylai
Copy link
Contributor

farleylai commented Jul 30, 2020

The git clone should be separate from the git repo to make sense:
https://docs.github.com/en/github/managing-large-files/distributing-large-binaries.

It would be great if each time the released checkpoints can be associated with some git tag.
So far, we have made copies on S3 and retrieve by version/tag for ease of comparing the differences.
If this can be done at the level of your repo release/APIs, life could be made easier.

Regarding the model/state_dict matching, perhaps the version or some git hash tag including the config can be inserted into the checkpoint or even the state_dict as a definitive proof? Otherwise, external breaking from PyTorch may still happen in the future.

As for the PyTorch hub download APIs based on urllib, additional redundancy and failure retry would be viewed as user's responsibility. Nonetheless, Adding simple exponential backoff with a retry max should be possible. I think you can make a first release for testing the download.

Other usability that can be enhanced could be the package organization and distribution that you may consider. Torch Hub APIs support exposing other entry points that would be useful for calling training/detection APIs programmatically other than just loading the model but exposing compiled cuda modules if any is unlikely since it merely downloads and extract the sources in the repo. Then a recipe to build a versioned conda distribution as PyTorch is likely necessary to help manage the dependencies and command line usage for training/detection/testing/etc. Perhaps, the top level package should be reorganized to something like yolov5 when exposed in the Python path.

BTW, since pytorch-1.6 has apex integrated, that part of dependency may be removed soon?

@glenn-jocher
Copy link
Member

@farleylai thanks buddy. Ok I think I understand better, we are really talking about two different things:

  • packaging weights with releases
  • autodownload functionality

We are already doing the latter, but you are recommending we migrate to a different method. We have the weights already hosted at static URI in GCP buckets, so a transition to S3 would not gain us anything. The mnain problem I see from your explanation are the costs. Our current solution does not incur storage or egress charges for most users, which is a must as our download volumes are in excess of what we want to (are able to) support out of our own pockets in the long term. For the former you're saying we should start doing this. Can you add files to releases retroactively or is this something you're saying we should try to incorporate into v3.0?

Yes, AMP is great, it works very well. We have a PR open for this, I'm simply waiting on Google Colab to update their environment, as the change breaks 1.5.1 compatibility (the default colab pytorch currently). With the AMP release we'll update the requirements.txt to torch>=1.6.

@farleylai
Copy link
Contributor

farleylai commented Jul 31, 2020

@glenn-jocher
Just played with the GitHub Release. It definitely supports editing past releases by adding/removing files to distribute as assets. Those release artifacts are NOT counted towards the repo storage usage nor cloned as the repo sources.

Perhaps, I did not make it clear. S3 is just for example that we do the versioning of the official checkpoints and our fine-tuned ones because so far the code simply downloads the latest one through the same link without an option to specify a tag or something. I am just proposing to add this option to the download API for ease of switching between different checkpoint versions. In that sense, the API may be enhanced to accept full download specs for supported cloud storage as extensions, not limited to GCP, S3 and so on but something the user can manage or host itself reliably.

@glenn-jocher
Copy link
Member

I think I've added the v1.0 and v2.0 models successfully now to the release files :)
See https://github.com/ultralytics/yolov5/releases/tag/v1.0

@mad-fogs
Copy link

mad-fogs commented Aug 4, 2020

i just update pytorch from 1.5.1 to 1.6.0 and this error comes.

@shliang0603
Copy link

shliang0603 commented Aug 21, 2020

i was using torch-1.6.x (from git source), deleted it and installed torch-1.5.1 that solved.

@glenn-jocher when I trained my model in yolov5-v3.0, giving error is follow :
image

My envs is follow:

  • torch==1.5.1
  • torchvision==0.6.1
  • cuda=10.2

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 21, 2020

@shliang0603 it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

@shliang0603
Copy link

@glenn-jocher I don't think has anything to do with NVIDIA APEX since I haven't installed it and I'm facing the same error.
Using nvidia driver 450 and CUDA 10.2

It just happens when I use the --weights parameter.

@glenn-jocher @sailfish009 Can you tell me your cuda version? My cuda is 10.2 and yolov5 V3.0, when I instll torch1.5.1 giving the error : AttributeError: module 'torch.nn' has no attribute 'Hardswish'! When I installing torch1.6.0 will report error : torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set' again! It's an endless cycle!I'm going to cry!

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 25, 2020

@shliang0603 _non_persistent_buffers_set error is well documented in the issues here. This error occurs when trying to load a pytorch 1.5.1 trained model with pytorch 1.6.

1.5.1 is no longer supported. cuda version varies by hardware, we use google colab and google cloud deep learning vms with 11.0 now. No issues on either.

@shliang0603
Copy link

shliang0603 commented Aug 25, 2020

The error has been resolved: torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set'

[English Note]

1、First of all, let me explain my environment

  • Ubuntu18.04
  • Cuda10.2
  • Python3.8
  • Pytorch1.6.0
  • torchvision0.7.0

2、Training command

python train.py --img 640 --batch 16 --epochs 300 --data ./data/my_data.yaml --cfg ./models/yolov5l.yaml --weights ./weights/yolov5l.pt --device 1

3、Problem analysis

I have solved the problem I think this should not be pytorch1.6.0 version of the bug, I encountered this error, because I am using yolov5 v3.0 is used in the training model of yolov5l. pt is downloaded from yolov5 V1.0. Because in the process of the training model yolov5l. pt download is slow and I'm lazy, so I took yolov5 V1.0 download yolov5l. pt copied to yolov5 v3.0, so there is an error: :torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set'

4、Solution to error problem

Directly download the pre-training model of yolov5l.pt in yolov5 v3.0


[Chinese Note]

1、首先说明一下我的环境

  • Ubuntu18.04
  • Cuda10.2
  • Python3.8
  • Pytorch1.6.0
  • torchvision0.7.0

2、训练命令

python train.py --img 640 --batch 16 --epochs 300 --data ./data/my_data.yaml --cfg ./models/yolov5l.yaml --weights ./weights/yolov5l.pt --device 1

3、问题原因分析

我已经解决了这个问题。我认为这应该不是pytorch1.6.0版本的bug,我之所以遇到这个错误是因为,我在用yolov5 v3.0 中使用的预训练模型yolov5l.pt是yolov5 V1.0的中下载的,因为预训练的模型yolov5l.pt下载的比较慢,而我又比较懒,所以我把yolov5 v1.0中下载的yolov5l.pt拷贝到yolov5 v3.0中,因此出现错误:torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set'

4、解决方法

直接在yolov5 v3.0中重新下载一下yolov5l.pt的预训练模型

@shliang0603
Copy link

@shliang0603 _non_persistent_buffers_set error is well documented in the issues here. This error occurs when trying to load a pytorch 1.5.1 trained model with pytorch 1.6.

1.5.1 is no longer supported. cuda version varies by hardware, we use google colab and google cloud deep learning vms with 11.0 now. No issues on either.

@glenn-jocher Thanks for your reply, I have solved the problem.

@glenn-jocher
Copy link
Member

@shliang0603 in principle you can simply add a _non_persistent_buffers_set set to every YOLOv5 module to fix this problem, but I would simply recommend using the latest models instead.

    for k, m in model.named_modules():
        m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatability

@shliang0603
Copy link

@glenn-jocher OK,Thanks.

@scamianbas
Copy link

Hi,
I just made a git pull and I have the same issue here : "torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set'" without using pretrained weights (Ubuntu Bionic, torchvision 0.7.0 & torch 1.6.0) running the command line below.
The cfg file is a copy of the one inside models folder with only the classes number adjusted.
Everything worked well for weeks since today.

python3.6 train.py --epochs 1024 --batch-size 4 --data coco128.yaml --cfg yolov5s.yaml
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1650', total_memory=3910MB)

Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='yolov5s.yaml', data='coco128.yaml', device='', epochs=1024, evolve=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], local_rank=-1, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=4, weights='yolov5s.pt', workers=8, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'lrf': 0.2, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'models.yolo.Model' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'torch.nn.modules.container.Sequential' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'models.common.Focus' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'models.common.Conv' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'torch.nn.modules.batchnorm.BatchNorm2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'torch.nn.modules.activation.LeakyReLU' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'models.common.BottleneckCSP' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'torch.nn.modules.pooling.MaxPool2d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'torch.nn.modules.upsampling.Upsample' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/mirko/.local/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'models.yolo.Detect' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1     18879  models.yolo.Detect                      [2, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.25779e+06 parameters, 7.25779e+06 gradients

Traceback (most recent call last):
  File "train.py", line 456, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 75, in train
    state_dict = ckpt['model'].float().state_dict()  # to FP32
  File "/home/mirko/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 900, in state_dict
    module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
  File "/home/mirko/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 900, in state_dict
    module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
  File "/home/mirko/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 900, in state_dict
    module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
  [Previous line repeated 1 more time]
  File "/home/mirko/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 897, in state_dict
    self._save_to_state_dict(destination, prefix, keep_vars)
  File "/home/mirko/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 860, in _save_to_state_dict
    if buf is not None and name not in self._non_persistent_buffers_set:
  File "/home/mirko/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 772, in __getattr__
    type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute '_non_persistent_buffers_set'

@scamianbas
Copy link

OK got it, if you do not want to use weights you have to type --weights '' otherwise it will load yolov5s.pt

HTH

@bhaktatejas922
Copy link

bhaktatejas922 commented Oct 4, 2020

I have this issue when using pytorch 1.7 built from source. any ideas on how to workaround if I need it to work with pytorch 1.7?

Using python 3.7, 32 bit arm

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 4, 2020

@bhaktatejas922 this will occur when using older models, i.e. trained with the v2.0 YOLOv5 release or torch<1.6. Train and export with the latest master:
git clone https://github.com/ultralytics/yolov5

@jasnei
Copy link

jasnei commented Oct 12, 2020

I have same issue, I'm using recommended environment, updated version, but still have same issue

  1. I copy below try to train, got same issue
    python train.py --data data/smoke.yaml --cfg models/yolov5s.yaml --weights weights/yolov5s.pt --batch-size 16 --epochs 100

  2. using below without pre-trained weights, it will download the pre-trained weight, it's working now.
    python train.py --data data/smoke.yaml --cfg models/yolov5s.yaml --batch-size 16 --epochs 100

image

I think it's not the Pytorch bug, but it's the pre-trained weights not compatible with the train model.

anyway, problem sort, I can run the training.

Thanks

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@Nabyssache
Copy link

pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple --default-timeout=1000
problems gone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests