Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. #1299

ShashankMosaicML · 2024-06-21T23:30:45Z

This allows for overriding default block configs for certain layers. This should contain two sub configs: order and overrides. order specifies the order of different kinds of layers (default refers to a layer that does not apply any overrides). For each kind of layer, specify the overrides in the overrides config. For example, to specify this model (https://research.character.ai/optimizing-inference/) , the following config will be needed:

model:
    ...
    (usual model configs)
    ...
    block_overrides:
        order:
        - name: default
        - order:
          - name: sliding_window_layer
          - name: sliding_window_layer_reuse
          - name: sliding_window_layer
          - repeat: 2
            name: sliding_window_layer_reuse
          - name: reuse_kv_layer
          repeat: 2
        overrides:
            sliding_window_layer:
                attn_config:
                    sliding_window_size: 1024
            sliding_window_layer_reuse:
                attn_config:
                    sliding_window_size: 1024
                    reuse_kv_layer_idx: -1 # Relative index of the layer whose kv cache to reuse
            reuse_kv_layer:
                attn_config:
                    reuse_kv_layer_idx: -6 # Relative index of the layer whose kv cache to reuse

Also prints the following log summarizing the network:

INFO: llmfoundry.models.mpt.modeling_mpt: The following is a summary of overrides per layer.
  idx  name                        overrides
-----  --------------------------  ----------------------------------------------------------
    0  default                     []
    1  sliding_window_layer        [{'sliding_window_size': 1024}]
    2  sliding_window_layer_reuse  [{'sliding_window_size': 1024}, {'reuse_kv_layer_idx': 1}]
    3  sliding_window_layer        [{'sliding_window_size': 1024}]
    4  sliding_window_layer_reuse  [{'sliding_window_size': 1024}, {'reuse_kv_layer_idx': 3}]
    5  sliding_window_layer_reuse  [{'sliding_window_size': 1024}, {'reuse_kv_layer_idx': 3}]
    6  reuse_kv_layer              [{'reuse_kv_layer_idx': 0}]
    7  sliding_window_layer        [{'sliding_window_size': 1024}]
    8  sliding_window_layer_reuse  [{'sliding_window_size': 1024}, {'reuse_kv_layer_idx': 7}]
    9  sliding_window_layer        [{'sliding_window_size': 1024}]
   10  sliding_window_layer_reuse  [{'sliding_window_size': 1024}, {'reuse_kv_layer_idx': 9}]
   11  sliding_window_layer_reuse  [{'sliding_window_size': 1024}, {'reuse_kv_layer_idx': 9}]
   12  reuse_kv_layer              [{'reuse_kv_layer_idx': 0}]

Note that the table above prints the absolute layer index for reuse_kv_layer_idx

…ike RNN, sliding window etc.

…/llm-foundry into mixed_attention_modules

dakinggg

mostly lgtm, are you done testing it?

llmfoundry/models/mpt/modeling_mpt.py

ShashankMosaicML · 2024-06-29T23:18:39Z

mostly lgtm, are you done testing it?

Yes, we have finished testing this. Everything seems fine.

llmfoundry/models/mpt/modeling_mpt.py

ShashankMosaicML and others added 5 commits June 21, 2024 13:07

[WIP] Allows interweaving of arbitrary kinds of 'attention' layers, l…

8804472

…ike RNN, sliding window etc.

lint

a50755e

applying overrides to blocks rather than just attentions

fcc28a1

add docstring

877d80e

Merge branch 'main' into mixed_attention_modules

1f11415

ShashankMosaicML requested review from vchiley, tbarton16 and dakinggg June 21, 2024 23:31

ShashankMosaicML added 8 commits June 21, 2024 16:34

minor

dd1c64b

Merge branch 'mixed_attention_modules' of github.com:ShashankMosaicML…

d3b32e6

…/llm-foundry into mixed_attention_modules

changing yaml specification style

fc1bf0b

..

81e2930

fixes

9b3d813

fix

aafcebb

fix

b46756f

fix

ad6ba32

ShashankMosaicML changed the title ~~[WIP] Allows interweaving of arbitrary kinds of 'attention' layers, like RNN, sliding window etc.~~ [WIP] Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. Jun 22, 2024

ShashankMosaicML added 5 commits June 22, 2024 09:51

refactoring

3ea79fd

add warning

13802cb

compute only query vector when reusing kv

9b6ae9c

refactor

c774a4b

fixing

8dee35e

ShashankMosaicML changed the title ~~[WIP] Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc.~~ Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. Jun 22, 2024

ShashankMosaicML marked this pull request as ready for review June 22, 2024 19:41

ShashankMosaicML requested a review from a team as a code owner June 22, 2024 19:41

ShashankMosaicML and others added 5 commits June 22, 2024 17:54

adding test for reusing previous layer kv cache

8ff15b4

Merge branch 'main' into mixed_attention_modules

b1ee62a

adding error messages

04e9888

..

5eee910

adding test

2a6c986

ShashankMosaicML and others added 12 commits June 26, 2024 17:36

..

0659e32

fixing test

180c004

Merge branch 'main' into mixed_attention_modules

2f073f7

Merge branch 'main' into mixed_attention_modules

4f9893a

changing yaml format

023070f

Merge branch 'main' into mixed_attention_modules

ca9a2e9

fix configuation

dc9890d

fixing test

4be19c6

Merge branch 'main' into mixed_attention_modules

d4a417a

allowing repeat at top level

a5298c3

Merge branch 'mixed_attention_modules' of github.com:ShashankMosaicML…

ecd560c

…/llm-foundry into mixed_attention_modules

allowing overriding error

700ede1

dakinggg reviewed Jun 29, 2024

View reviewed changes

ShashankMosaicML and others added 11 commits June 29, 2024 11:18

Merge branch 'main' into mixed_attention_modules

ff055a4

addressing comments

ca047fa

lint

3ca3d55

addressing comments

3fbec2e

fix

e7d85bb

..

8d97354

..

54abbd2

Merge branch 'main' into mixed_attention_modules

c6ac78a

..

52536cd

..

6d62c2c

..

2b53237

dakinggg approved these changes Jun 30, 2024

View reviewed changes

llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved

ShashankMosaicML added 2 commits June 30, 2024 13:54

addressing comment

d53a1e6

fixing test

6554a22

ShashankMosaicML merged commit 8604bba into mosaicml:main Jun 30, 2024
9 checks passed

ShashankMosaicML deleted the mixed_attention_modules branch June 30, 2024 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. #1299

Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. #1299

ShashankMosaicML commented Jun 21, 2024 •

edited

Loading

dakinggg left a comment

ShashankMosaicML commented Jun 29, 2024

Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. #1299

Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. #1299

Conversation

ShashankMosaicML commented Jun 21, 2024 • edited Loading

dakinggg left a comment

Choose a reason for hiding this comment

ShashankMosaicML commented Jun 29, 2024

ShashankMosaicML commented Jun 21, 2024 •

edited

Loading