Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hugging Face GPT2 models segfault on CPU #2822

Closed
cjvolzka opened this issue May 9, 2024 · 3 comments
Closed

Hugging Face GPT2 models segfault on CPU #2822

cjvolzka opened this issue May 9, 2024 · 3 comments

Comments

@cjvolzka
Copy link
Collaborator

cjvolzka commented May 9, 2024

Running the HuggingFace openai-community/gpt2 (and gpt2-large and gpt2-xl variants) compiles but returns a segfault when run on CPU. The model runs fine when compiled for NNPA.

Reproduce

Export Model

I converted the models to onnx using the Hugging Face optimum-cli. The optimum-cli does not work on s390x so I converted the model on my Mac and then transferred the exported model to a Linux on Z host to compile it.

model_name=gpt2
opset=13
task=text-generation
optimum-cli export onnx --model ${model_name} --framework pt --atol 0.001 --task ${task} --opset ${opset} ${model_name}-${task}-${opset}

Change model_name to gpt-large, etc to export other variants

Compile Mode

And afterward compiled the model with

  • CPU
    • --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 model.onnx --onnx-op-stats TXT --profile-ir=Onnx
  • NNPA
    • --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z16 --maccel=NNPA model.onnx --onnx-op-stats TXT --profile-ir=ZHigh

Run model

I used a C++ client to run the model. For the inputs, I encoded the second paragraph of Les Miserables (see
inputs.txt for values used)

The models compiled for NNPA (except gpt2-xl, optset 13) run without issue. The CPU compiled version (and gpt2-xl, opset 13, NNPA) appear to fail at the same spot based on the profile output:

...
==PERF-REPORT==, onnx.Squeeze, /transformer/h.0/attn/Squeeze_2, after, 0.000005, 0.271208
==PERF-REPORT==, onnx.Sub, /transformer/h.0/attn/Sub, before, 0.000004, 0.271212
==PERF-REPORT==, onnx.Sub, /transformer/h.0/attn/Sub, after, 0.000004, 0.271216
==PERF-REPORT==, onnx.Unsqueeze, /transformer/h.0/attn/Unsqueeze_6, before, 0.000004, 0.271220
==PERF-REPORT==, onnx.Unsqueeze, /transformer/h.0/attn/Unsqueeze_6, after, 0.000004, 0.271224
==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000004, 0.271228

Variant results:

  • gpt2
    • Unspecified opset (13 default)
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000003, 0.029315
      • NNPA - Successful
    • Opset 12
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000003, 0.028127
      • NNPA - Successful
    • Opset 13
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000002, 0.027932
      • NNPA - Successful
    • Opset 17
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000002, 0.027919
      • NNPA - Successful
  • gpt2-large
    • Unspecified opset (13 default)
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000001, 0.229088
      • NNPA - Success
    • Opset 12
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000002, 0.195636
      • NNPA - Success
    • Opset 13
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000001, 0.196711
      • NNPA - Success
    • Opset 17
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000004, 0.071682
      • NNPA - Success
  • gpt2-xl
    • Unspecified opset (13 default)
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000004, 0.104978
      • NNPA - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000001, 0.011877
    • Opset 12
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000002, 0.178973
      • NNPA - Success
    • Opset 13
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000004, 0.271228
      • NNPA - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000006, 0.011747
    • Opset 17
      • CPU - Runtime segfault: ==PERF-REPORT==, onnx.Slice, /transformer/h.0/attn/Slice_3, before, 0.000001, 0.301007
      • NNPA - Success
@imaihal
Copy link
Collaborator

imaihal commented Jul 5, 2024

@cjvolzka @mikeessen I confirmed gpt2 optset 13 runs without segfault by using PR #2865. Could you double-check it? I hope other gpt-2 models run.

@imaihal
Copy link
Collaborator

imaihal commented Jul 9, 2024

I also confirmed gpt2-xl with Opset 17 runs correctly without segfault.

@cjvolzka
Copy link
Collaborator Author

cjvolzka commented Jul 9, 2024

I ran through all the variations and confirmed I was able to successfully compile and run the model variants. Thanks for the fix @imaihal!

@cjvolzka cjvolzka closed this as completed Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants