kabyanil · 2024-07-11T14:04:24Z

Describe the issue

I am trying to convert a pytorch transformer model to onnx. My model architecture consists of multiple nn.modules, so I am converting each to onnx separately. I am having to use a combination of torch.onnx.export() and torch.onnx.dynamo_export(), because some module conversions do not support dynamo_export yet.

I am able to convert all the modules to onnx. However, when I run an inference session through the decoder module, I get the mentioned error. For reference, here is my Decoder class -

class Decoder(nn.Module):

    def __init__(self, features: int, layers: nn.ModuleList) -> None:
        super().__init__()
        self.layers = layers
        self.norm = LayerNormalization(features)

    def forward(self, x, encoder_output, src_mask, tgt_mask):
        for layer in self.layers:
            x = layer(x, encoder_output, src_mask, tgt_mask)
        return self.norm(x)

Here is my code for onnx conversion of the module -

dummy_decoder_input = torch.randint(low=0, high=60, size =(1, 1, 512), dtype=torch.float)
dummy_encoder_output = torch.randint(low=0, high=60, size =(1, 32, 512), dtype=torch.float)
dummy_src_mask = torch.tensor([[[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0]]]], dtype=torch.int)
dummy_tgt_mask = torch.tensor([[1]], dtype=torch.int)

args = (dummy_decoder_input, dummy_encoder_output, dummy_src_mask, dummy_tgt_mask)
# encoder_output, source_mask, decoder_input, decoder_mask
dynamic_axes = {
    'decoder_input': {0: 'batch_size', 1: 'seq_len', 2: 'embed_dim'},
    'encoder_output': {0: 'batch_size', 1: 'seq_len', 2: 'embed_dim'},
    'src_mask': {3: 'seq_len'},
    'tgt_mask': {0: 'seq_len', 1: 'seq_len'},
    'output': {0: 'batch_size', 1: 'sequence_length'}
}

x, encoder_output, src_mask, tgt_mask
torch.onnx.export(test_scripted_decoder,
                  args=args,
                  f="./onnx/decoder.onnx",
                  input_names=['decoder_input', 'encoder_output', 'src_mask', 'tgt_mask'],
                  output_names=['output'],
                  dynamic_axes=dynamic_axes,
                  verbose=True
                )

Here is my inference code in onnx -

def run_inference(input_string):
  input_tensor = encode_input(eng_tokenizer, input_string)
  encoder_input = prepare_encoder_input(eng_tokenizer, input_tensor)
  src_mask = prepare_encoder_mask(eng_tokenizer, encoder_input)

  # Run encoder
  src_embed_output = src_embed_layer.run(None, {'l_x_': encoder_input.numpy()})[0]

  src_pos_output = src_pos_layer.run(None, {'l_x_': src_embed_output})[0]

  encoder_output = src_encoder_layer.run(None, {'input_1': src_pos_output, 'input_2': src_mask.numpy()})[0]


  # Run decoder
  tgt_input = torch.tensor([[eng_tokenizer.encode('<')[0]]], dtype=torch.int32)

  while True:
      if tgt_input.size(1) == 32:
          break

      tgt_mask = causal_mask(tgt_input.size(1)).numpy().astype(np.int32)


      tgt_embed_output = tgt_embed_layer.run(None, {'l_x_': tgt_input.numpy().astype(np.int32)})[0]


      tgt_pos_output = tgt_pos_layer.run(None, {'input_1': tgt_embed_output})[0]

# ERROR OCCURS ON THE NEXT LINE
      decoder_output = tgt_decoder_layer.run(None, {'decoder_input': tgt_pos_output, 'encoder_output': encoder_output, 'src_mask': src_mask.numpy(), 'tgt_mask': tgt_mask})[0]

      last_dim = decoder_output[:, -1]

      prob = tgt_projection_layer.run(None, {'l_x_': last_dim})[0]
      next_word = np.argmax(prob, axis=1)[0]

      next_word_tensor = torch.tensor([[next_word]], dtype=torch.int64)
      tgt_input = torch.cat((tgt_input, next_word_tensor), dim=1)

      if next_word == eng_tokenizer.encode('>')[0]:
          break

  output_tokens = tgt_input.squeeze().tolist()
  output_string = eng_tokenizer.decode(output_tokens)
  return output_string

run_inference("hello")

I have marked the line which throws the error above. Here is the full error -

---------------------------------------------------------------------------
RuntimeException                          Traceback (most recent call last)
[<ipython-input-41-ce3f1a0c8e40>](https://localhost:8080/#) in <cell line: 47>()
     45   return output_string
     46 
---> 47 run_inference("hello")

1 frames
[/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py](https://localhost:8080/#) in run(self, output_names, input_feed, run_options)
    218             output_names = [output.name for output in self._outputs_meta]
    219         try:
--> 220             return self._sess.run(output_names, input_feed, run_options)
    221         except C.EPFail as err:
    222             if self._enable_fallback:

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MatMul node. Name:'/MatMul_7' Status Message: /onnxruntime_src/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue* onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape&) status.IsOK() was false. Shape mismatch attempting to re-use buffer. {1,1,512} != {1,32,512}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.

Any help in resolving this issue would be appreciated. Thanks.

Urgency

The issue is urgent to me. I am building a project in which I need to deploy the models on web, mobile and desktop. I chose onnx as it is in active development.

As I wait for a resolution, I am considering experimenting with executorch.

Target platform

Google Colab Ubuntu 22.04.3 LTS

Build script

Described in the description above.

Error / output

RuntimeException Traceback (most recent call last)
in <cell line: 47>()
45 return output_string
46
---> 47 run_inference("hello")

1 frames
/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in run(self, output_names, input_feed, run_options)
218 output_names = [output.name for output in self._outputs_meta]
219 try:
--> 220 return self._sess.run(output_names, input_feed, run_options)
221 except C.EPFail as err:
222 if self._enable_fallback:

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MatMul node. Name:'/MatMul_7' Status Message: /onnxruntime_src/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue* onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape&) status.IsOK() was false. Shape mismatch attempting to re-use buffer. {1,1,512} != {1,32,512}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.

Visual Studio Version

No response

GCC / Compiler Version

11.4.0

The text was updated successfully, but these errors were encountered:

yufenglee · 2024-07-11T17:56:11Z

@kabyanil, it looks like a model issue based on the error. It is not possible to Reshape a tensor with shape {1,32,512} to {1,1,8,64}. The former has 32512 elements while the target only has 864 elements. I guess it is intended to convert {1, 32, 512} to {1, 32, 8, 64}. Please check the model.

kabyanil · 2024-07-11T18:02:14Z

@kabyanil, it looks like a model issue based on the error. It is not possible to Reshape a tensor with shape {1,32,512} to {1,1,8,64}. The former has 32512 elements while the target only has 864 elements. I guess it is intended to convert {1, 32, 512} to {1, 32, 8, 64}. Please check the model.

I am able to run inference in python using the same code which I used for onnx conversion. What could be the issue then?

kabyanil · 2024-07-11T18:44:57Z

@kabyanil, it looks like a model issue based on the error. It is not possible to Reshape a tensor with shape {1,32,512} to {1,1,8,64}. The former has 32_512 elements while the target only has 8_64 elements. I guess it is intended to convert {1, 32, 512} to {1, 32, 8, 64}. Please check the model.

I have updated the error message. Can you please check now?

tianleiwu · 2024-07-11T19:25:12Z

@kabyanil,

I guess the decoder_input and encoder_output shall have same shape (batch_size, seq_len, hidden_size). It seems that you use different shape as dummy input in your code:

dummy_decoder_input = torch.randint(low=0, high=60, size =(1, 1, 512), dtype=torch.float)
dummy_encoder_output = torch.randint(low=0, high=60, size =(1, 32, 512), dtype=torch.float)

kabyanil · 2024-07-12T12:52:37Z

@tianleiwu During inference, the encoder encodes the input to (batch_size, seq_len, embed_dim) where batch_size=1, seq_len=32 and embed_dim=512. Inputs less than seq_len are padded to make them of length 32. Whereas in the decoder, the input starts from seq_len=1. At every output of the decoder, the next token is selected using torch.max() and appended to the decoder's earlier input. So, the decoder starts from seq_len=1 and goes up until EOS token is hit.

To mimic this behaviour, I chose the dummy_encoder_output to be fixed at (1, 32, 512), and the dummy_decoder_input to be the initial shape of (1, 1, 512).

Is this a mistake?

tianleiwu · 2024-07-12T18:18:52Z

@Qkabyanil,
If they are different, you shall use different string in the dynamic axes like:

dynamic_axes = {
    'decoder_input': {0: 'batch_size', 1: 'decoder_seq_len', 2: 'embed_dim'},
    'encoder_output': {0: 'batch_size', 1: 'encoder_seq_len', 2: 'embed_dim'},
...
}

1.18.1 will merge the Shape nodes when they found the symbolic shape is same:
#19832

It was reverted in main branch, but still in 1.18.* release. You can try older version like 1.17.* to walk around it.

kabyanil · 2024-07-13T09:31:59Z

Thanks so much, naming the inputs differently solved the issue.

kabyanil added the build build issues; typically submitted using template label Jul 11, 2024

github-actions bot added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Jul 11, 2024

sophies927 removed the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Jul 11, 2024

kabyanil closed this as completed Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kabyanil commented Jul 11, 2024 •

edited

Loading

yufenglee commented Jul 11, 2024

kabyanil commented Jul 11, 2024

kabyanil commented Jul 11, 2024

tianleiwu commented Jul 11, 2024

kabyanil commented Jul 12, 2024 •

edited

Loading

tianleiwu commented Jul 12, 2024 •

edited

Loading

kabyanil commented Jul 13, 2024 •

edited

Loading

Comments

kabyanil commented Jul 11, 2024 • edited Loading

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

yufenglee commented Jul 11, 2024

kabyanil commented Jul 11, 2024

kabyanil commented Jul 11, 2024

tianleiwu commented Jul 11, 2024

kabyanil commented Jul 12, 2024 • edited Loading

tianleiwu commented Jul 12, 2024 • edited Loading

kabyanil commented Jul 13, 2024 • edited Loading

kabyanil commented Jul 11, 2024 •

edited

Loading

kabyanil commented Jul 12, 2024 •

edited

Loading

tianleiwu commented Jul 12, 2024 •

edited

Loading

kabyanil commented Jul 13, 2024 •

edited

Loading