RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation when running on Docker #1552

NanoCode012 · 2020-11-29T16:02:00Z

🐛 Bug

I got the below error message when I try to test out the latest commit cff9263 on a new docker image. I haven't pulled recently, so I'm not sure which commit made this error.

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

To Reproduce (REQUIRED)

Pull docker and run it
Run python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --nosave --cache

Output:

 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Traceback (most recent call last):
  File "train.py", line 492, in <module>
    train(hyp, opt, device, tb_writer, wandb)
  File "train.py", line 83, in train
    model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc).to(device)  # create
  File "/usr/src/app/models/yolo.py", line 95, in __init__
    self._initialize_biases()  # only run once
  File "/usr/src/app/models/yolo.py", line 150, in _initialize_biases
    b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

Expected behavior

Run normally

Environment

Docker + JupyterLab (from my repo)
CPU, 1 GPU, Multi-GPU

Additional context

It seems to run fine when I'm running from an old conda py37 environment with torch 1.6.
I cannot reproduce this error on Google Colab.
Could there be something wrong with Docker dependencies?

The text was updated successfully, but these errors were encountered:

glenn-jocher · 2020-11-29T16:15:07Z

@NanoCode012 thanks for the bug report. I'll try to reproduce with yolov5:latest on a GCP instance.

I've seen this error in the past when running in-place ops like L150 in your error message with autograd on, but that line has not changed in a long time. PyTorch versions are changing though, so perhaps this is handled differently now.

glenn-jocher · 2020-11-29T16:23:59Z

Yeah I get the same result. I think the issue is that nvidia seems to prefer pytorch nightly for their FROM images rather than the last stable release, so I can't tell if this is a nightly instability or there's some 1.8 update set to cause errors on this in the future.

If I pull latest and then run this line, everything trains fine.

pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

I guess for now I'll simply reset the image FROM tag to 20.10, which I think was working well.

glenn-jocher · 2020-11-29T16:27:48Z

Or wait, I just had a great idea! I think if I start from a different base image, such as pytorch/pytorch:latest, then this seems to point to the last stable release, and perhaps eliminates maintenance also as the tag never changes. I will try an experiment and see if it works.

FROM nvcr.io/nvidia/pytorch:20.11-py3
FROM pytorch/pytorch:latest

glenn-jocher · 2020-11-29T16:43:35Z

I tried to create a pytorch:latest image here with this Dockerfile, but the image lacks some dependencies like cv2, which are causing problems on pip install, so I gave up on it. The Dockerfile is here in case anyone can debug this. In the meantime I think a rollback to 20.10 will fix this, I'll get that done.

docker pull ultralytics/yolov5:pytorch_latest

FROM pytorch/pytorch:latest

# Install dependencies
RUN pip install --upgrade pip
# COPY requirements.txt .
# RUN pip install -r requirements.txt
RUN pip install gsutil

# Create working directory
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

# Copy contents
COPY . /usr/src/app

glenn-jocher · 2020-11-29T16:51:13Z

Verified new image works, problem should be resolved now in PR #1553

NanoCode012 · 2020-11-29T16:53:09Z

Thanks glenn! I will wait for image to build from dockerhub and test it!

Regarding pytorch:latest, I think it could be dangerous to use it in DockerFile because if there is some breaking change, you may not know till someone reports it.

Edit: This would mean that this repo will not be able to use later versions of nvidia's package until this bug is fixed somehow..

glenn-jocher · 2020-11-29T16:57:13Z

@NanoCode012 yes that's true. The docker images don't actually have any CI tests, they just build on every commit under the assumption that the github CI tests would mostly apply to docker as well, but it is true that they often may use different PyTorch versions. GitHub also updates their dependencies on their own schedule, so when 1.6 came out for example the next day we had the daily CI test failing.

cesarandreslopez · 2020-11-30T04:13:40Z

@glenn-jocher correct me if I am wrong, but both nvcr.io/nvidia/pytorch:20.11-py3 and nvcr.io/nvidia/pytorch:20.10-py3 seems to use python 3.6

This project requests 3.8 or above.

Will this be a problem?

glenn-jocher · 2020-11-30T11:37:00Z

@cesarandreslopez yes I noticed that as well. I'm not sure if 3.6.0 is compatible with this repo, I think the last one I checked was using 3.6.9. I'm doing all development in 3.8.0, but in general backwards compatibility is something I don't have lots of time to maintain and verify, which is the reason I've simply put 3.8 down as the requirement.

But as you're seeing 3.7 appears compatible, as well as possibly much of the 3.6.

MingcongCao · 2020-12-02T10:13:00Z

Hi, guys. @NanoCode012 @glenn-jocher The following code works for me:
with torch.no_grad():
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls

glenn-jocher · 2020-12-02T10:23:04Z

@MingcongCao ah, I've resolved the original issue by resetting the base image to Nvidia 20.10, so all docker operations should be operating correctly now.

hcodee · 2020-12-07T14:50:17Z

I have met this issue with RTX3090 & Cuda 11.1.0. Is there any solution for this configuration?

python train.py --batch-size 64 --data ./data/coco128.yaml --cfg ./models/yolov5s.yaml --weights ''

Using torch 1.8.0.dev20201117 CUDA:0 (GeForce RTX 3090, 24265MB)

Traceback (most recent call last):
File "train.py", line 492, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 91, in train
model = Model(opt.cfg, ch=3, nc=nc).to(device) # create
File "/home/yons/work/yolov5/models/yolo.py", line 95, in init
self._initialize_biases() # only run once
File "/home/yons/work/yolov5/models/yolo.py", line 150, in _initialize_biases
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

NanoCode012 · 2020-12-07T14:58:19Z

@hcodee , could you try stable torch 1.7?

hcodee · 2020-12-07T15:03:06Z

The torch 1.7 does not work with RTX3090. Takes long time to figure out to run on nightly build Torch 1.8.

batrlatom · 2020-12-07T15:31:07Z

The torch 1.7 does not work with RTX3090. Takes long time to figure out to run on nightly build Torch 1.8.

You need to compile pytorch yourself witch cuda 11.1 installed. It is doable, I did it without any hassle ( surprisingly ) from master. Unfortunately I need to do it again for 1.7

hcodee · 2020-12-07T15:56:19Z

@batrlatom Cool, Thanks remind. I will try it out.

dnth · 2020-12-17T11:01:33Z

I have met this issue with RTX3090 & Cuda 11.1.0. Is there any solution for this configuration?

python train.py --batch-size 64 --data ./data/coco128.yaml --cfg ./models/yolov5s.yaml --weights ''

Using torch 1.8.0.dev20201117 CUDA:0 (GeForce RTX 3090, 24265MB)

Traceback (most recent call last):
File "train.py", line 492, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 91, in train
model = Model(opt.cfg, ch=3, nc=nc).to(device) # create
File "/home/yons/work/yolov5/models/yolo.py", line 95, in init
self._initialize_biases() # only run once
File "/home/yons/work/yolov5/models/yolo.py", line 150, in _initialize_biases
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

Same problem with nighly pytorch version here. Any luck with using the self compiled pytorch 1.8?

DoctorKey · 2020-12-18T02:44:49Z

I met the same issue with pytorch 1.8, and the following code works for me:

b.data[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls

glenn-jocher · 2020-12-23T01:17:28Z

I just ran into this issue myself, so it's time for a fix :) Will add a TODO and prioritize this for a fix ASAP.

glenn-jocher · 2020-12-23T01:20:48Z

@DoctorKey can confirm your solution works correctly. I will submit a PR for this to master.

glenn-jocher · 2020-12-23T01:29:34Z

@NanoCode012 @DoctorKey @batrlatom @hcodee this problems should be resolved now by implementing @DoctorKey fix in PR #1759. Docker image for ultralytics/yolov5:latest should be updated in a few minutes with this fix.

Let me know if any other issues pop up, and thank you for your contributions!

Nytsirch · 2021-03-15T07:54:24Z

Hi i am new to this i just encountered a runtime problem
Traceback (most recent call last):
File "train.py", line 492, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 91, in train
model = Model(opt.cfg, ch=3, nc=nc).to(device) # create
File "/content/yolov5/models/yolo.py", line 95, in init
self._initialize_biases() # only run once
File "/content/yolov5/models/yolo.py", line 150, in _initialize_biases
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

i am using torch 1.8.0+cu101

i really dont know what to do.
any help please

glenn-jocher · 2021-03-15T16:49:43Z

@Nytsirch this error is likely generated by an unsupported 3rd party notebook. Please see the official YOLOv5 Colab Notebook below, and visit the Train Custom Data Tutorial to get started with YOLOv5.
https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb

Tutorials

Train Custom Data 🚀 RECOMMENDED
Weights & Biases Logging 🌟 NEW
Multi-GPU Training
PyTorch Hub ⭐ NEW
ONNX and TorchScript Export
Test-Time Augmentation (TTA)
Model Ensembling
Model Pruning/Sparsity
Hyperparameter Evolution
Transfer Learning with Frozen Layers ⭐ NEW
TensorRT Deployment

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

NingAnMe · 2021-04-12T07:58:07Z

Hi i am new to this i just encountered a runtime problem
Traceback (most recent call last):
File "train.py", line 492, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 91, in train
model = Model(opt.cfg, ch=3, nc=nc).to(device) # create
File "/content/yolov5/models/yolo.py", line 95, in init
self._initialize_biases() # only run once
File "/content/yolov5/models/yolo.py", line 150, in _initialize_biases
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

i am using torch 1.8.0+cu101

i really dont know what to do.
any help please

change the two old lines in 'yolo.py'

b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls

to new

b.data[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls

…s/yolov5#1552.

NanoCode012 added the bug Something isn't working label Nov 29, 2020

glenn-jocher mentioned this issue Nov 29, 2020

FROM nvcr.io/nvidia/pytorch:20.10-py3 #1553

Merged

glenn-jocher linked a pull request Nov 29, 2020 that will close this issue

FROM nvcr.io/nvidia/pytorch:20.10-py3 #1553

Merged

glenn-jocher closed this as completed in #1553 Nov 29, 2020

glenn-jocher added the TODO label Dec 23, 2020

This was referenced Dec 23, 2020

leaf Variable inplace bug fix #1759

Merged

leaf Variable inplace bug fix ultralytics/yolov3#1619

Merged

glenn-jocher removed the TODO label Dec 23, 2020

DeanZag mentioned this issue Jun 18, 2021

Multi-GPU, subprocess.CalledProcessError #3663

Closed

tbeatty pushed a commit to tbeatty/ScaledYOLOv4 that referenced this issue Jul 6, 2021

Adjust bias initialization. Fixes WongKinYiu#196. See also ultralytic…

4188c9c

…s/yolov5#1552.

tbeatty added a commit to tbeatty/ScaledYOLOv4 that referenced this issue Jul 6, 2021

Adjust bias initialization. Fixes WongKinYiu#196. See also ultralytic…

3d37efb

…s/yolov5#1552.

This was referenced Sep 3, 2021

Leaf Variable inplace bug fix naruarjun/YOLOP#1

Merged

Leaf Variable inplace bug fix hustvl/YOLOP#17

Merged

dt1729 mentioned this issue Mar 29, 2023

Training breaks with Pytorch version 1.13.0 and torchvision version 0.8.0 PingoLH/CenterNet-HarDNet#15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation when running on Docker #1552

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation when running on Docker #1552

NanoCode012 commented Nov 29, 2020

glenn-jocher commented Nov 29, 2020

glenn-jocher commented Nov 29, 2020

glenn-jocher commented Nov 29, 2020

glenn-jocher commented Nov 29, 2020 •

edited

Loading

glenn-jocher commented Nov 29, 2020

NanoCode012 commented Nov 29, 2020 •

edited

Loading

glenn-jocher commented Nov 29, 2020

cesarandreslopez commented Nov 30, 2020

glenn-jocher commented Nov 30, 2020

MingcongCao commented Dec 2, 2020 •

edited

Loading

glenn-jocher commented Dec 2, 2020

hcodee commented Dec 7, 2020 •

edited

Loading

NanoCode012 commented Dec 7, 2020

hcodee commented Dec 7, 2020

batrlatom commented Dec 7, 2020 •

edited

Loading

hcodee commented Dec 7, 2020

dnth commented Dec 17, 2020

DoctorKey commented Dec 18, 2020

glenn-jocher commented Dec 23, 2020

glenn-jocher commented Dec 23, 2020

glenn-jocher commented Dec 23, 2020

Nytsirch commented Mar 15, 2021

glenn-jocher commented Mar 15, 2021 •

edited

Loading

NingAnMe commented Apr 12, 2021

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation when running on Docker #1552

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation when running on Docker #1552

Comments

NanoCode012 commented Nov 29, 2020

🐛 Bug

To Reproduce (REQUIRED)

Expected behavior

Environment

Additional context

glenn-jocher commented Nov 29, 2020

glenn-jocher commented Nov 29, 2020

glenn-jocher commented Nov 29, 2020

glenn-jocher commented Nov 29, 2020 • edited Loading

glenn-jocher commented Nov 29, 2020

NanoCode012 commented Nov 29, 2020 • edited Loading

glenn-jocher commented Nov 29, 2020

cesarandreslopez commented Nov 30, 2020

glenn-jocher commented Nov 30, 2020

MingcongCao commented Dec 2, 2020 • edited Loading

glenn-jocher commented Dec 2, 2020

hcodee commented Dec 7, 2020 • edited Loading

NanoCode012 commented Dec 7, 2020

hcodee commented Dec 7, 2020

batrlatom commented Dec 7, 2020 • edited Loading

hcodee commented Dec 7, 2020

dnth commented Dec 17, 2020

DoctorKey commented Dec 18, 2020

glenn-jocher commented Dec 23, 2020

glenn-jocher commented Dec 23, 2020

glenn-jocher commented Dec 23, 2020

Nytsirch commented Mar 15, 2021

glenn-jocher commented Mar 15, 2021 • edited Loading

Tutorials

Requirements

Environments

Status

NingAnMe commented Apr 12, 2021

glenn-jocher commented Nov 29, 2020 •

edited

Loading

NanoCode012 commented Nov 29, 2020 •

edited

Loading

MingcongCao commented Dec 2, 2020 •

edited

Loading

hcodee commented Dec 7, 2020 •

edited

Loading

batrlatom commented Dec 7, 2020 •

edited

Loading

glenn-jocher commented Mar 15, 2021 •

edited

Loading