-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Unable to jit.script DETR res50 model: :Dictionary inputs to traced functions must have consistent type. Found Tensor and List[Dict[str, Tensor]] #208
Comments
Hi, DETR supports scripting the model, not tracing it. Can you try instead model = torch.jit.script(model) |
Thanks very much @fmassa! ~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) AttributeError: 'Tensor' object has no attribute 'tensors' Is there any additional steps needed after scripting? I put it into eval (smodel.eval() - not sure that is even needed as I don't believe jit can update BN etc? but to be safe..) and pushed it to the gpu and then tried the above. |
I believe the torchscript model might only work if you pass a lost of 3d tensors, and not a single 4d tensor (but I would need to double-check). Can you try that? |
Hi @fmassa - I tested with a list of 3d tensor and 3d tensor only, as well as list of 4d tensor. |
I believe it would still expect a NestedTensor as input.
Best of luck |
Hi @alcinos, Note that you listed 'inputs = NestedTensor.from_tensor_list([img]).to(device)' .... to be safe: 1 - in util.misc I only see a NestedTensor class, and a function 'nested_tensor_from_tensor_list'. 2 - am I correct that img is a tensorized/resized/normalized tensor with no batch size added dimension (I just used a single image tensor in the list)? Thanks! |
Hi @lessw2020 1 - Lines 69 to 76 in 5e66b4c
2 - The example above shows that img is a Tensor which has been resized / normalized, no batch dimension (so 3d tensor)
I believe this should be enough to get your example working. As such, I'm closing this issue but let us know if the problem persists |
I just realized that I missed one of your questions, which was that the code seemed to deadlock. This might be due to the fact that torchscript is compiling the code at its first / second invocation, which might take a while. How long did you wait for it? |
Hi @fmassa - I waited about a minute each time. I didn't realize it was doing the compile but did become alarmed at the lack of responsiveness and assumed it was in an infinite loop. Let me re-run tomorrow and will give it more time and see if that was the root issue. Thanks for the script above, that's very helpful and also the info about need to wait. 1 - Does this also mean that for production, we would want to push 1 or 2 images as 'warmup' to a given model before we set it to 'live' for incoming images as users won't expect to wait for a minute+ for a response? Thanks again and will update tomorrow! |
@lessw2020 yes, for now torchscript uses a JIT (just-in-time) compiler, so we need to feed a few images beforehand so that the model can be compiled. Note that at some point in the future PyTorch might also support AOT (ahead of time) compilation, but it's not yet there. Let us know how much time it takes to compile the model in your setup (with / without CUDA, etc). There might be improvements to the compilation time that could be done (if the times are too high) |
Hi @fmassa - was finally able to test this. could you kindly review and advise? '''RuntimeError Traceback (most recent call last) ~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter.
File "/home/ubuntu/cdetr/models/backbone.py", line 101, in forward ''' |
hi @fmassa - I pushed everything to cpu and after about an 8 minute wait, got an error - this looks more promising though as it appears to have made it through the model? |
Hey @lessw2020 The torchscript version only support models without So what I would recommend you to do is to set |
Hi @fmassa - thanks a bunch for all the help! The main issue was the aux_loss! One last question if I may: in other words a value for 'no_aux_loss' imo would mean that True = no aux_loss, but... True for this arg really means the presence or not of aux_loss. Regardless, did want to say thanks a ton for all the help getting JIT mode working as will be using that for production! The initial compile time is still a big hit but at least being aware of it means we can handle it. |
Hey, sorry I missed this before
The original name of the arg was |
Instructions To Reproduce the 🐛 Bug:
git diff
) or what code you wrotewhat exact command you run:
prepared a single image per normal validation process as the 'sample' (resize, tensorize, normalize, unsqueeze to make batch 1, push to gpu).
then:
traced` = torch.jit.trace(model,single_batch_tensorimg)
what you observed (including full logs):
run, such as a private dataset.
Expected behavior:
Able to trace and export the traced DETR model for production use :)
(I confirmed I have the latest DETR source that has the PR from June 4 with the fixes to script the resnet models).
If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.
Environment:
Provide your environment information using the following command:
The text was updated successfully, but these errors were encountered: