You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For example, the xlm-roberta-base-language-detection-onnx is about 1.11GB but during compile time I see peaks up to 9GB of memory used by onnx-mlir, opt and llc compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --onnx-op-stats TXT.
The Mistral-7B-v0.1 model is about 29GB but during compile time I see peaks up to 70+Gb and sustained 58GB memory compiling with --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --store-constants-to-file --onnx-op-stats TXT
Is there anything that can be done to reduce the compile time memory required for these kind of models?
The text was updated successfully, but these errors were encountered:
cjvolzka
changed the title
Models take significant amounts of memory to compile
xlm-roberta and Mistral-7B take significant amounts of memory during compilation
May 9, 2024
@imaihal Sorry, I missed your question. Below is how I generated the Mistral onnx model.
Notes:
I exported the model using my Mac as the tools don't support s390x. Afterward, I transferred the folder it created (with the onnx file and constants) to the s390x host to compile the model.
the huggingface-cli comand will ask a couple of questions:
While compiling models like HuggingFace protectai/xlm-roberta-base-language-detection-onnx or mistralai/Mistral-7B-v0.1 I notice we take significantly larger amounts of memory than the entire model size during compiling.
For example, the xlm-roberta-base-language-detection-onnx is about 1.11GB but during compile time I see peaks up to 9GB of memory used by
onnx-mlir
,opt
andllc
compiling with--O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --onnx-op-stats TXT
.The Mistral-7B-v0.1 model is about 29GB but during compile time I see peaks up to 70+Gb and sustained 58GB memory compiling with
--O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z14 --store-constants-to-file --onnx-op-stats TXT
Is there anything that can be done to reduce the compile time memory required for these kind of models?
The text was updated successfully, but these errors were encountered: