Change Transformers export default sequence length to max_position_embeddings #1826
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Usage
Reason
Unfortunately I think it is very important to export LLMs with the largest sequence length possible.
For instance I was exporting llama models with a sequence length of 512 for convenience
sparseml.transformers.export_onnx --model_path ./llama2.c-stories15M --task text-generation --sequence_length 512
However due to the PyTorch export and constant folding it seems to produce a cached, fixed rotary embedding of the sequence length used. See it as 512 in this screenshot. The ONNX can still run and compile with a sequence length larger than 512, but the output is unstable and quickly starts to repeat