RuntimeError: The following operation failed in the TorchScript interpreter. #5070

andreiionutdamian · 2021-10-06T16:36:51Z

Hi, I am using current repo in order to make a simple test with export in torch script then try to load the torch-script file in a separate python script.
I am using:

  Python 3.8.8
  Torch 1.8.0

the command I am running is
python export.py --weights yolov5l.pt --include torchscript --device 0

then in the test script all I do is:

# get some 1 batch test image
imgs = []
img = cv2.imread('Images/test.png')
img_resized = cv2.resize(img, (640, 640))
imgs.append(img_resized)
np_imgs = np.ascontiguousarray(np.array(imgs)[:,:,:,::-1])
# load previously generated file
model = th.jit.load('yolo5l.torchscript.pt')
model_dev = next(model.parameters()).device
np_imgs = np_imgs.transpose((0, 3, 1, 2))
th_imgs = th.tensor(np_imgs)
th_imgs = th_imgs.to(model_dev)
th_x = (th_imgs / 255.).float()
# here I inspect and data is in nice and ready in the GPU so ...
with th.no_grad():
  # everything is good until this point
    th_yh = model(th_x)

At this point I get the below error:

    result = self.forward(*input, **kwargs)

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/models/yolo.py", line 33, in forward
    _22 = getattr(self.model, "2")
    _23 = getattr(self.model, "1")
    _24 = (getattr(self.model, "0")).forward(x, )
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _25 = (_22).forward((_23).forward(_24, ), )
    _26 = (_20).forward((_21).forward(_25, ), )
  File "code/__torch__/models/common.py", line 19, in forward
    _8 = torch.slice(_7, 3, 1, 9223372036854775807, 2)
    input = torch.cat([_2, _4, _6, _8], 1)
    return (_0).forward(input, )
            ~~~~~~~~~~~ <--- HERE
class Conv(Module):
  __parameters__ = []
  File "code/__torch__/models/common.py", line 29, in forward
  def forward(self: __torch__.models.common.Conv,
    input: Tensor) -> Tensor:
    _9 = (self.act).forward((self.conv).forward(input, ), )
                             ~~~~~~~~~~~~~~~~~~ <--- HERE
    return _9
class C3(Module):
  File "code/__torch__/torch/nn/modules/conv.py", line 11, in forward
    input: Tensor) -> Tensor:
    _0 = self.bias
    x = torch._convolution(input, self.weight, _0, [1, 1], [1, 1], [1, 1], False, [0, 0], 1, False, False, True, True)
        ~~~~~~~~~~~~~~~~~~ <--- HERE
    return x

Traceback of TorchScript, original code (most recent call last):
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\conv.py(395): _conv_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\conv.py(399): forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(860): _slow_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(887): _call_impl
C:\WORK\05_SolisBox\_cache\_models\__u_y5\models\common.py(48): forward_fuse
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(860): _slow_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(887): _call_impl
C:\WORK\05_SolisBox\_cache\_models\__u_y5\models\common.py(206): forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(860): _slow_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(887): _call_impl
C:\WORK\05_SolisBox\_cache\_models\__u_y5\models\yolo.py(146): _forward_once
C:\WORK\05_SolisBox\_cache\_models\__u_y5\models\yolo.py(124): forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(860): _slow_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(887): _call_impl
c:\anaconda3\envs\th\lib\site-packages\torch\jit\_trace.py(934): trace_module
c:\anaconda3\envs\th\lib\site-packages\torch\jit\_trace.py(733): trace
C:\WORK\05_SolisBox\_cache\_models\__u_y5\export.py(62): export_torchscript
C:\WORK\05_SolisBox\_cache\_models\__u_y5\export.py(310): run
c:\anaconda3\envs\th\lib\site-packages\torch\autograd\grad_mode.py(27): decorate_context
C:\WORK\05_SolisBox\_cache\_models\__u_y5\export.py(365): main
C:\WORK\05_SolisBox\_cache\_models\__u_y5\export.py(370): <module>
c:\anaconda3\envs\th\lib\site-packages\spyder_kernels\customize\spydercustomize.py(453): exec_code
c:\anaconda3\envs\th\lib\site-packages\spyder_kernels\customize\spydercustomize.py(565): runfile
<ipython-input-1-0225bb70c045>(1): <module>
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(3437): run_code
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(3357): run_ast_nodes
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(3165): run_cell_async
c:\anaconda3\envs\th\lib\site-packages\IPython\core\async_helpers.py(68): _pseudo_sync_runner
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(2940): _run_cell
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(2894): run_cell
c:\anaconda3\envs\th\lib\site-packages\ipykernel\zmqshell.py(536): run_cell
c:\anaconda3\envs\th\lib\site-packages\ipykernel\ipkernel.py(306): do_execute
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(234): wrapper
c:\anaconda3\envs\th\lib\site-packages\ipykernel\kernelbase.py(543): execute_request
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(234): wrapper
c:\anaconda3\envs\th\lib\site-packages\ipykernel\kernelbase.py(268): dispatch_shell
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(234): wrapper
c:\anaconda3\envs\th\lib\site-packages\ipykernel\kernelbase.py(365): process_one
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(775): run
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(814): inner
c:\anaconda3\envs\th\lib\site-packages\tornado\ioloop.py(741): _run_callback
c:\anaconda3\envs\th\lib\site-packages\tornado\ioloop.py(688): <lambda>
c:\anaconda3\envs\th\lib\asyncio\events.py(81): _run
c:\anaconda3\envs\th\lib\asyncio\base_events.py(1859): _run_once
c:\anaconda3\envs\th\lib\asyncio\base_events.py(570): run_forever
c:\anaconda3\envs\th\lib\site-packages\tornado\platform\asyncio.py(199): start
c:\anaconda3\envs\th\lib\site-packages\ipykernel\kernelapp.py(612): start
c:\anaconda3\envs\th\lib\site-packages\spyder_kernels\console\start.py(296): main
c:\anaconda3\envs\th\lib\site-packages\spyder_kernels\console\__main__.py(23): <module>
c:\anaconda3\envs\th\lib\runpy.py(87): _run_code
c:\anaconda3\envs\th\lib\runpy.py(194): _run_module_as_main
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

The text was updated successfully, but these errors were encountered:

github-actions · 2021-10-06T16:37:34Z

👋 Hello @andreiionutdamian, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher · 2021-10-06T18:10:49Z

@andreiionutdamian I don't have torchscript inference experience, so I'm not sure, but CUDNN_STATUS_INTERNAL_ERROR may be due to a lack of resources, i.e. CUDA memory, RAM, vcpus.

andreiionutdamian · 2021-10-07T14:15:11Z

@andreiionutdamian I don't have torchscript inference experience, so I'm not sure, but CUDNN_STATUS_INTERNAL_ERROR may be due to a lack of resources, i.e. CUDA memory, RAM, vcpus.
@glenn-jocher I understand your point of view however this is not the case. On "eager" inference the model does not allocate more than 3 GB of GPU RAM on the toughest of tests (batches of 8-16 1280x images, etc) while the GPU is a GTX 3080 with 11 GB RAM.
So, I see no logical reason for the model to allocate more memory when loaded with jit/torchscript.
On the other hand the environment is tested: wrote a couple of models, trained them, torch-scripted them then "served" them from other python scripts with no issues.

glenn-jocher · 2021-10-07T20:17:04Z

@andreiionutdamian hmm. Is the error is reproducible with a small inference load, i.e. --img 640 --batch 1?

I know this is a bit tangent to your issue, but could you submit a PR updating detect.py with torchscript inference (bug and all)? This might help a lot of people (like me) who've never used torchscript models understand the usage and might allow for more community debugging.

andreiionutdamian · 2021-10-08T05:29:05Z

@andreiionutdamian hmm. Is the error is reproducible with a small inference load, i.e. --img 640 --batch 1?

I know this is a bit tangent to your issue, but could you submit a PR updating detect.py with torchscript inference (bug and all)? This might help a lot of people (like me) who've never used torchscript models understand the usage and might allow for more community debugging.

@glenn-jocher I certainly tested multiple scenarios including the following:

generated using yolov5l a (1,3,640,640) and tested (fail)
generated using yolov5l6 a (1,3, 1280, 1280) then tested and failed again
generated using yolov5s a (1, 3, 640, 640) the another fail
...tried even generating non-square inputs with largers bs (such as (2, 3, 896, 1280)) - all failures
Again, I ran sanity checks on torch-script (thought maybe it is broken) using my own (pre)trained models and everything worked just fine.

glenn-jocher · 2021-10-08T06:02:48Z

@andreiionutdamian yeah I'm sure there's a bug somewhere.

If you submit a PR to update detect.py then it's much more likely future users will find and fix that bug.

Since currently there's no support for torchscript inference in the repo there's technically no action for us to take.

andreiionutdamian · 2021-10-09T08:00:57Z

@glenn-jocher I followed your sugestion and created the PR # #5109

andreiionutdamian · 2021-10-15T07:34:17Z

Well, looks like all you need to do is:

Be carefull about the input shape - it has to be exactly as the shape of the tensor that was used during trace
Always warm your model with some zero-filled tensor like below:

   _par0 = next(model.parameters())
   model_dtype = _par0.dtype
   model_device = _par0.device
   th_warm = th.zeros(
     MODEL_BATCH, 3, MODEL_H, MODEL_W, 
     device=model_device,
     dtype=model_dtype
     )
   _ = model(th_warm)

where MODEL_BATCH, MODEL_H and MODEL_W respect condition (1)

If you dont respect the first one the graph ops will fail (being fixed on specific inputs). IF you dont respect the second some strange CUDA voodoo is happening and you get RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

glenn-jocher · 2021-10-15T17:43:50Z

@andreiionutdamian item 2 is normally handled by this line in detect.py, though in the case of pytorch models this is mainly to improve profiling results since the first GPU run is always slower than subsequent runs.

yolov5/detect.py

Lines 132 to 133 in fc36064

    
           if pt and device.type != 'cpu': 
        
               model(torch.zeros(1, 3, *imgsz).to(device).type_as(next(model.parameters())))  # run once

Does this mean that torchscript inference should work in master as long as the input shape is correct?

andreiionutdamian · 2021-10-16T07:32:21Z

@andreiionutdamian item 2 is normally handled by this line in detect.py, though in the case of pytorch models this is mainly to improve profiling results since the first GPU run is always slower than subsequent runs.

yolov5/detect.py

Lines 132 to 133 in fc36064

if pt and device.type != 'cpu':

model(torch.zeros(1, 3, *imgsz).to(device).type_as(next(model.parameters()))) # run once

Does this mean that torchscript inference should work in master as long as the input shape is correct?

@glenn-jocher Mostly yes, I created a PR with minor modifications including export.py. What is really CRAZY is that if you use a normal image tensor (consider 1st couple of inferences as warm-ups) then you'll get CUDA illegal ops and crash.

andreiionutdamian added the bug Something isn't working label Oct 6, 2021

This was referenced Oct 9, 2021

TFLite, ONNX, CoreML, TensorRT Export #251

Open

update detect.py in order to support torch script #5109

Merged

andreiionutdamian mentioned this issue Oct 15, 2021

Still unable to detect with TorchScript model #5161

Closed

andreiionutdamian closed this as completed Oct 15, 2021

CURTLab mentioned this issue Jan 23, 2024

Libtorch/c++ GPU inferance ultralytics/ultralytics#7744

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: The following operation failed in the TorchScript interpreter. #5070

RuntimeError: The following operation failed in the TorchScript interpreter. #5070

andreiionutdamian commented Oct 6, 2021 •

edited

Loading

github-actions bot commented Oct 6, 2021 •

edited by glenn-jocher

Loading

glenn-jocher commented Oct 6, 2021

andreiionutdamian commented Oct 7, 2021 •

edited

Loading

glenn-jocher commented Oct 7, 2021

andreiionutdamian commented Oct 8, 2021

glenn-jocher commented Oct 8, 2021

andreiionutdamian commented Oct 9, 2021

andreiionutdamian commented Oct 15, 2021 •

edited

Loading

glenn-jocher commented Oct 15, 2021

andreiionutdamian commented Oct 16, 2021 •

edited

Loading

RuntimeError: The following operation failed in the TorchScript interpreter. #5070

RuntimeError: The following operation failed in the TorchScript interpreter. #5070

Comments

andreiionutdamian commented Oct 6, 2021 • edited Loading

github-actions bot commented Oct 6, 2021 • edited by glenn-jocher Loading

Requirements

Environments

Status

glenn-jocher commented Oct 6, 2021

andreiionutdamian commented Oct 7, 2021 • edited Loading

glenn-jocher commented Oct 7, 2021

andreiionutdamian commented Oct 8, 2021

glenn-jocher commented Oct 8, 2021

andreiionutdamian commented Oct 9, 2021

andreiionutdamian commented Oct 15, 2021 • edited Loading

glenn-jocher commented Oct 15, 2021

andreiionutdamian commented Oct 16, 2021 • edited Loading

andreiionutdamian commented Oct 6, 2021 •

edited

Loading

github-actions bot commented Oct 6, 2021 •

edited by glenn-jocher

Loading

andreiionutdamian commented Oct 7, 2021 •

edited

Loading

andreiionutdamian commented Oct 15, 2021 •

edited

Loading

andreiionutdamian commented Oct 16, 2021 •

edited

Loading