Support variable-length sequences for mamba block with position indices #434

ptxu78 · 2024-07-01T08:57:23Z

Enable the mamba block to support variable-length sequence inputs using positional encoding. Passing Positional Indices results in negligible performance loss for the mamba block. For common variable-length sequence distributions, performance can be improved by 4-6x.

Modification:
Refer to the link [Feature] Support variable-length sequences for mamba block #244 to replace cumulative sequences with a positional encoding matrix. The position encoding is more suitable for parallel acceleration and is more commonly found in the outputs of various dataloaders.

For example, a packaged sequence of length 16 consists of four independent sub-sequences with lengths of 3, 5, 6, 2. Then corresponding:
Cumulative sequence: [0, 3, 8, 14, 16]
Position encoding: [0, 1, 2, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 0, 1]

In the Mamba module, there are two steps that are not sequence-wise: conv1d and selective_scan. Sub-sequences within the same sequence can affect each other, and we have modified causal-conv1d and selective_scan. These two CUDA operators are implemented using position encoding to eliminate the mutual influence between sub-sequences.

How to use:

Setup two cuda_operators:

git clone --branch feat/pack_with_position_indices https://github.com/ptxu78/pack_mamba.git
python setup .../pack_mamba/setup_onlyCUDA.py install
git clone --branch feat/pack_with_position_indices https://github.com/ptxu78/causal-conv1d-pack.git
python setup .../causal-conv1d-pack/setup_onlyCUDA.py install

Pack the variable-length sequences and input them, along with the corresponding position encoding.

…ions on packed data without interference between token sequences.

…-end pack experiments with the mamba block. Added support for position_indices in conv1d within mamba_inner_fn. The conv1d code can be found at https://github.com/ptxu78/causal-conv1d-pack/tree/feat/pack_with_position_indices.

ScottHoang · 2024-07-18T17:42:54Z

This PR attempts to resolve the issue derived from training with packed data?

ptxu78 · 2024-07-25T06:40:13Z

This PR attempts to resolve the issue derived from training with packed data?

Yes, this PR allow Mamba to handle packed data more effectively: it significantly increases throughput while ensuring the mathematical equivalence of the training results.

ptxu78 added 4 commits June 17, 2024 00:53

Support position-index at the SSM operator level, enabling SSM operat…

96657a7

…ions on packed data without interference between token sequences.

update test

7f86e19

update setup.py

6b5f07c

ptxu78 closed this Jul 1, 2024

ptxu78 reopened this Jul 4, 2024

mzusman mentioned this pull request Aug 19, 2024

[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM vllm-project/vllm#7651

Merged

zigzagcai mentioned this pull request Sep 13, 2024

Question about support for sequence parallel #176

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support variable-length sequences for mamba block with position indices #434

Support variable-length sequences for mamba block with position indices #434

ptxu78 commented Jul 1, 2024 •

edited

Loading

ScottHoang commented Jul 18, 2024

ptxu78 commented Jul 25, 2024

Support variable-length sequences for mamba block with position indices #434

Are you sure you want to change the base?

Support variable-length sequences for mamba block with position indices #434

Conversation

ptxu78 commented Jul 1, 2024 • edited Loading

ScottHoang commented Jul 18, 2024

ptxu78 commented Jul 25, 2024

ptxu78 commented Jul 1, 2024 •

edited

Loading