Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: The following operation failed in the TorchScript interpreter. #5070

Closed
andreiionutdamian opened this issue Oct 6, 2021 · 10 comments
Labels
bug Something isn't working

Comments

@andreiionutdamian
Copy link
Contributor

andreiionutdamian commented Oct 6, 2021

Hi, I am using current repo in order to make a simple test with export in torch script then try to load the torch-script file in a separate python script.
I am using:

  Python 3.8.8
  Torch 1.8.0

the command I am running is
python export.py --weights yolov5l.pt --include torchscript --device 0

then in the test script all I do is:

# get some 1 batch test image
imgs = []
img = cv2.imread('Images/test.png')
img_resized = cv2.resize(img, (640, 640))
imgs.append(img_resized)
np_imgs = np.ascontiguousarray(np.array(imgs)[:,:,:,::-1])
# load previously generated file
model = th.jit.load('yolo5l.torchscript.pt')
model_dev = next(model.parameters()).device
np_imgs = np_imgs.transpose((0, 3, 1, 2))
th_imgs = th.tensor(np_imgs)
th_imgs = th_imgs.to(model_dev)
th_x = (th_imgs / 255.).float()
# here I inspect and data is in nice and ready in the GPU so ...
with th.no_grad():
  # everything is good until this point
    th_yh = model(th_x)

At this point I get the below error:

    result = self.forward(*input, **kwargs)

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/models/yolo.py", line 33, in forward
    _22 = getattr(self.model, "2")
    _23 = getattr(self.model, "1")
    _24 = (getattr(self.model, "0")).forward(x, )
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _25 = (_22).forward((_23).forward(_24, ), )
    _26 = (_20).forward((_21).forward(_25, ), )
  File "code/__torch__/models/common.py", line 19, in forward
    _8 = torch.slice(_7, 3, 1, 9223372036854775807, 2)
    input = torch.cat([_2, _4, _6, _8], 1)
    return (_0).forward(input, )
            ~~~~~~~~~~~ <--- HERE
class Conv(Module):
  __parameters__ = []
  File "code/__torch__/models/common.py", line 29, in forward
  def forward(self: __torch__.models.common.Conv,
    input: Tensor) -> Tensor:
    _9 = (self.act).forward((self.conv).forward(input, ), )
                             ~~~~~~~~~~~~~~~~~~ <--- HERE
    return _9
class C3(Module):
  File "code/__torch__/torch/nn/modules/conv.py", line 11, in forward
    input: Tensor) -> Tensor:
    _0 = self.bias
    x = torch._convolution(input, self.weight, _0, [1, 1], [1, 1], [1, 1], False, [0, 0], 1, False, False, True, True)
        ~~~~~~~~~~~~~~~~~~ <--- HERE
    return x

Traceback of TorchScript, original code (most recent call last):
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\conv.py(395): _conv_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\conv.py(399): forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(860): _slow_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(887): _call_impl
C:\WORK\05_SolisBox\_cache\_models\__u_y5\models\common.py(48): forward_fuse
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(860): _slow_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(887): _call_impl
C:\WORK\05_SolisBox\_cache\_models\__u_y5\models\common.py(206): forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(860): _slow_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(887): _call_impl
C:\WORK\05_SolisBox\_cache\_models\__u_y5\models\yolo.py(146): _forward_once
C:\WORK\05_SolisBox\_cache\_models\__u_y5\models\yolo.py(124): forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(860): _slow_forward
c:\anaconda3\envs\th\lib\site-packages\torch\nn\modules\module.py(887): _call_impl
c:\anaconda3\envs\th\lib\site-packages\torch\jit\_trace.py(934): trace_module
c:\anaconda3\envs\th\lib\site-packages\torch\jit\_trace.py(733): trace
C:\WORK\05_SolisBox\_cache\_models\__u_y5\export.py(62): export_torchscript
C:\WORK\05_SolisBox\_cache\_models\__u_y5\export.py(310): run
c:\anaconda3\envs\th\lib\site-packages\torch\autograd\grad_mode.py(27): decorate_context
C:\WORK\05_SolisBox\_cache\_models\__u_y5\export.py(365): main
C:\WORK\05_SolisBox\_cache\_models\__u_y5\export.py(370): <module>
c:\anaconda3\envs\th\lib\site-packages\spyder_kernels\customize\spydercustomize.py(453): exec_code
c:\anaconda3\envs\th\lib\site-packages\spyder_kernels\customize\spydercustomize.py(565): runfile
<ipython-input-1-0225bb70c045>(1): <module>
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(3437): run_code
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(3357): run_ast_nodes
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(3165): run_cell_async
c:\anaconda3\envs\th\lib\site-packages\IPython\core\async_helpers.py(68): _pseudo_sync_runner
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(2940): _run_cell
c:\anaconda3\envs\th\lib\site-packages\IPython\core\interactiveshell.py(2894): run_cell
c:\anaconda3\envs\th\lib\site-packages\ipykernel\zmqshell.py(536): run_cell
c:\anaconda3\envs\th\lib\site-packages\ipykernel\ipkernel.py(306): do_execute
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(234): wrapper
c:\anaconda3\envs\th\lib\site-packages\ipykernel\kernelbase.py(543): execute_request
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(234): wrapper
c:\anaconda3\envs\th\lib\site-packages\ipykernel\kernelbase.py(268): dispatch_shell
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(234): wrapper
c:\anaconda3\envs\th\lib\site-packages\ipykernel\kernelbase.py(365): process_one
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(775): run
c:\anaconda3\envs\th\lib\site-packages\tornado\gen.py(814): inner
c:\anaconda3\envs\th\lib\site-packages\tornado\ioloop.py(741): _run_callback
c:\anaconda3\envs\th\lib\site-packages\tornado\ioloop.py(688): <lambda>
c:\anaconda3\envs\th\lib\asyncio\events.py(81): _run
c:\anaconda3\envs\th\lib\asyncio\base_events.py(1859): _run_once
c:\anaconda3\envs\th\lib\asyncio\base_events.py(570): run_forever
c:\anaconda3\envs\th\lib\site-packages\tornado\platform\asyncio.py(199): start
c:\anaconda3\envs\th\lib\site-packages\ipykernel\kernelapp.py(612): start
c:\anaconda3\envs\th\lib\site-packages\spyder_kernels\console\start.py(296): main
c:\anaconda3\envs\th\lib\site-packages\spyder_kernels\console\__main__.py(23): <module>
c:\anaconda3\envs\th\lib\runpy.py(87): _run_code
c:\anaconda3\envs\th\lib\runpy.py(194): _run_module_as_main
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
@andreiionutdamian andreiionutdamian added the bug Something isn't working label Oct 6, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2021

👋 Hello @andreiionutdamian, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@andreiionutdamian I don't have torchscript inference experience, so I'm not sure, but CUDNN_STATUS_INTERNAL_ERROR may be due to a lack of resources, i.e. CUDA memory, RAM, vcpus.

@andreiionutdamian
Copy link
Contributor Author

andreiionutdamian commented Oct 7, 2021

@andreiionutdamian I don't have torchscript inference experience, so I'm not sure, but CUDNN_STATUS_INTERNAL_ERROR may be due to a lack of resources, i.e. CUDA memory, RAM, vcpus.
@glenn-jocher I understand your point of view however this is not the case. On "eager" inference the model does not allocate more than 3 GB of GPU RAM on the toughest of tests (batches of 8-16 1280x images, etc) while the GPU is a GTX 3080 with 11 GB RAM.
So, I see no logical reason for the model to allocate more memory when loaded with jit/torchscript.
On the other hand the environment is tested: wrote a couple of models, trained them, torch-scripted them then "served" them from other python scripts with no issues.

@glenn-jocher
Copy link
Member

@andreiionutdamian hmm. Is the error is reproducible with a small inference load, i.e. --img 640 --batch 1?

I know this is a bit tangent to your issue, but could you submit a PR updating detect.py with torchscript inference (bug and all)? This might help a lot of people (like me) who've never used torchscript models understand the usage and might allow for more community debugging.

@andreiionutdamian
Copy link
Contributor Author

@andreiionutdamian hmm. Is the error is reproducible with a small inference load, i.e. --img 640 --batch 1?

I know this is a bit tangent to your issue, but could you submit a PR updating detect.py with torchscript inference (bug and all)? This might help a lot of people (like me) who've never used torchscript models understand the usage and might allow for more community debugging.

@glenn-jocher I certainly tested multiple scenarios including the following:

  • generated using yolov5l a (1,3,640,640) and tested (fail)
  • generated using yolov5l6 a (1,3, 1280, 1280) then tested and failed again
  • generated using yolov5s a (1, 3, 640, 640) the another fail
  • ...tried even generating non-square inputs with largers bs (such as (2, 3, 896, 1280)) - all failures
    Again, I ran sanity checks on torch-script (thought maybe it is broken) using my own (pre)trained models and everything worked just fine.

@glenn-jocher
Copy link
Member

@andreiionutdamian yeah I'm sure there's a bug somewhere.

If you submit a PR to update detect.py then it's much more likely future users will find and fix that bug.

Since currently there's no support for torchscript inference in the repo there's technically no action for us to take.

@andreiionutdamian
Copy link
Contributor Author

@glenn-jocher I followed your sugestion and created the PR # #5109

@andreiionutdamian
Copy link
Contributor Author

andreiionutdamian commented Oct 15, 2021

Well, looks like all you need to do is:

  1. Be carefull about the input shape - it has to be exactly as the shape of the tensor that was used during trace
  2. Always warm your model with some zero-filled tensor like below:
   _par0 = next(model.parameters())
   model_dtype = _par0.dtype
   model_device = _par0.device
   th_warm = th.zeros(
     MODEL_BATCH, 3, MODEL_H, MODEL_W, 
     device=model_device,
     dtype=model_dtype
     )
   _ = model(th_warm)   

where MODEL_BATCH, MODEL_H and MODEL_W respect condition (1)

If you dont respect the first one the graph ops will fail (being fixed on specific inputs). IF you dont respect the second some strange CUDA voodoo is happening and you get RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

@glenn-jocher
Copy link
Member

@andreiionutdamian item 2 is normally handled by this line in detect.py, though in the case of pytorch models this is mainly to improve profiling results since the first GPU run is always slower than subsequent runs.

yolov5/detect.py

Lines 132 to 133 in fc36064

if pt and device.type != 'cpu':
model(torch.zeros(1, 3, *imgsz).to(device).type_as(next(model.parameters()))) # run once

Does this mean that torchscript inference should work in master as long as the input shape is correct?

@andreiionutdamian
Copy link
Contributor Author

andreiionutdamian commented Oct 16, 2021

@andreiionutdamian item 2 is normally handled by this line in detect.py, though in the case of pytorch models this is mainly to improve profiling results since the first GPU run is always slower than subsequent runs.

yolov5/detect.py

Lines 132 to 133 in fc36064

if pt and device.type != 'cpu':
model(torch.zeros(1, 3, *imgsz).to(device).type_as(next(model.parameters()))) # run once

Does this mean that torchscript inference should work in master as long as the input shape is correct?

@glenn-jocher Mostly yes, I created a PR with minor modifications including export.py. What is really CRAZY is that if you use a normal image tensor (consider 1st couple of inferences as warm-ups) then you'll get CUDA illegal ops and crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants