Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review flash/sdpa arg #25

Merged
merged 9 commits into from
Jun 13, 2024
Merged

review flash/sdpa arg #25

merged 9 commits into from
Jun 13, 2024

Conversation

vince62s
Copy link
Contributor

replace the model setting self_attn_type by a RunningConfig setting self_attn_backend = "flash, pytorch"

In fact:
At training, when using Rotary or Legacy Position Encoding, using flash or pytroch sdpa is almost the same. if alibi or max_relative_positions then it will use "manual" matmul anyway.

At inference: using "flash" instead of "pytorch" will trigger the use of flash_func_with_kvcache which is much faster and not implemented in pytorch 2.3 yet

@vince62s vince62s merged commit 3a9b137 into eole-nlp:main Jun 13, 2024
2 checks passed
@francoishernandez francoishernandez linked an issue Jun 13, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

change self_attn_type
1 participant