Hugging Face GPT2 models segfault on CPU #2822

cjvolzka · 2024-05-09T00:58:14Z

Running the HuggingFace openai-community/gpt2 (and gpt2-large and gpt2-xl variants) compiles but returns a segfault when run on CPU. The model runs fine when compiled for NNPA.

Reproduce

Export Model

I converted the models to onnx using the Hugging Face optimum-cli. The optimum-cli does not work on s390x so I converted the model on my Mac and then transferred the exported model to a Linux on Z host to compile it.

model_name=gpt2
opset=13
task=text-generation
optimum-cli export onnx --model ${model_name} --framework pt --atol 0.001 --task ${task} --opset ${opset} ${model_name}-${task}-${opset}

Change model_name to gpt-large, etc to export other variants

Compile Mode

And afterward compiled the model with

CPU
- --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 model.onnx --onnx-op-stats TXT --profile-ir=Onnx
NNPA
- --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z16 --maccel=NNPA model.onnx --onnx-op-stats TXT --profile-ir=ZHigh

Run model

I used a C++ client to run the model. For the inputs, I encoded the second paragraph of Les Miserables (see
inputs.txt for values used)

The models compiled for NNPA (except gpt2-xl, optset 13) run without issue. The CPU compiled version (and gpt2-xl, opset 13, NNPA) appear to fail at the same spot based on the profile output:

...
==PERF-REPORT==, onnx.Squeeze, /transformer/h.0/attn/Squeeze_2, after, 0.000005, 0.271208
==PERF-REPORT==, onnx.Sub, /transformer/h.0/attn/Sub, before, 0.000004, 0.271212
==PERF-REPORT==, onnx.Sub, /transformer/h.0/attn/Sub, after, 0.000004, 0.271216
==PERF-REPORT==, onnx.Unsqueeze, /transformer/h.0/attn/Unsqueeze_6, before, 0.000004, 0.271220
==PERF-REPORT==, onnx.Unsqueeze, /transformer/h.0/attn/Unsqueeze_6, after, 0.000004, 0.271224
==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000004, 0.271228

Variant results:

gpt2
- Unspecified opset (13 default)
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000003, 0.029315
  - NNPA - Successful
- Opset 12
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000003, 0.028127
  - NNPA - Successful
- Opset 13
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000002, 0.027932
  - NNPA - Successful
- Opset 17
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000002, 0.027919
  - NNPA - Successful
gpt2-large
- Unspecified opset (13 default)
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000001, 0.229088
  - NNPA - Success
- Opset 12
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000002, 0.195636
  - NNPA - Success
- Opset 13
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000001, 0.196711
  - NNPA - Success
- Opset 17
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000004, 0.071682
  - NNPA - Success
gpt2-xl
- Unspecified opset (13 default)
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000004, 0.104978
  - NNPA - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000001, 0.011877
- Opset 12
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000002, 0.178973
  - NNPA - Success
- Opset 13
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000004, 0.271228
  - NNPA - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000006, 0.011747
- Opset 17
  - CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000001, 0.301007
  - NNPA - Success

The text was updated successfully, but these errors were encountered:

imaihal · 2024-07-05T06:43:40Z

@cjvolzka @mikeessen I confirmed gpt2 optset 13 runs without segfault by using PR #2865. Could you double-check it? I hope other gpt-2 models run.

imaihal · 2024-07-09T06:18:32Z

I also confirmed gpt2-xl with Opset 17 runs correctly without segfault.

cjvolzka · 2024-07-09T20:56:23Z

I ran through all the variations and confirmed I was able to successfully compile and run the model variants. Thanks for the fix @imaihal!

imaihal mentioned this issue Jul 5, 2024

Avoid using StringAttr when lowering dense constant with i1 type #2865

Merged

cjvolzka closed this as completed Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hugging Face GPT2 models segfault on CPU #2822

Hugging Face GPT2 models segfault on CPU #2822

cjvolzka commented May 9, 2024

imaihal commented Jul 5, 2024 •

edited

Loading

imaihal commented Jul 9, 2024

cjvolzka commented Jul 9, 2024

Hugging Face GPT2 models segfault on CPU #2822

Hugging Face GPT2 models segfault on CPU #2822

Comments

cjvolzka commented May 9, 2024

Reproduce

Export Model

Compile Mode

Run model

Variant results:

imaihal commented Jul 5, 2024 • edited Loading

imaihal commented Jul 9, 2024

cjvolzka commented Jul 9, 2024

imaihal commented Jul 5, 2024 •

edited

Loading