review flash/sdpa arg #25

vince62s · 2024-06-12T14:23:53Z

replace the model setting self_attn_type by a RunningConfig setting self_attn_backend = "flash, pytorch"

In fact:
At training, when using Rotary or Legacy Position Encoding, using flash or pytroch sdpa is almost the same. if alibi or max_relative_positions then it will use "manual" matmul anyway.

At inference: using "flash" instead of "pytorch" will trigger the use of flash_func_with_kvcache which is much faster and not implemented in pytorch 2.3 yet

vince62s added 9 commits June 12, 2024 16:19

review flash/sdpa arg

2e18cb0

flake

2526006

fix

14c3561

fix

da0595f

typo

e348b11

typo

238a26f

fix validator

f4229f6

fix

e348e0d

fix

73710be

vince62s merged commit 3a9b137 into eole-nlp:main Jun 13, 2024
2 checks passed

francoishernandez linked an issue Jun 13, 2024 that may be closed by this pull request

change self_attn_type #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review flash/sdpa arg #25

review flash/sdpa arg #25

vince62s commented Jun 12, 2024

review flash/sdpa arg #25

review flash/sdpa arg #25

Conversation

vince62s commented Jun 12, 2024