Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : add Flash Attention #5021

Merged
merged 145 commits into from
Apr 30, 2024
Merged

ggml : add Flash Attention #5021

merged 145 commits into from
Apr 30, 2024

Commits on Jan 18, 2024

  1. Configuration menu
    Copy the full SHA
    a1c004e View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2024

  1. Configuration menu
    Copy the full SHA
    fa7ebcc View commit details
    Browse the repository at this point in the history

Commits on Jan 20, 2024

  1. Configuration menu
    Copy the full SHA
    c3cdfff View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a9681fe View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2024

  1. Configuration menu
    Copy the full SHA
    1173f49 View commit details
    Browse the repository at this point in the history
  2. metal : f16 precision

    ggerganov committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    528da75 View commit details
    Browse the repository at this point in the history
  3. metal : reduce branches

    ggerganov committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    52ae085 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b973258 View commit details
    Browse the repository at this point in the history
  5. wip : 8 rows per simd group

    ggerganov committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    8cde449 View commit details
    Browse the repository at this point in the history
  6. wip : 4 rows per simd group

    ggerganov committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    f31955f View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    a4b6341 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    77d08f3 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    17720fa View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2024

  1. Configuration menu
    Copy the full SHA
    1446a12 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d917746 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    432ad04 View commit details
    Browse the repository at this point in the history
  4. metal : fix comment

    ggerganov committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    40ea8cd View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f9ca5dc View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    6fea843 View commit details
    Browse the repository at this point in the history

Commits on Jan 28, 2024

  1. Configuration menu
    Copy the full SHA
    b3dd7d9 View commit details
    Browse the repository at this point in the history
  2. metal : move output into local memory + optimize

    - the result from each simdgroup now stays in the registers
    - significantly reduced SRAM usage
    - more efficient skipping of -INF blocks
    - avoid simdgroup barrier in hot loop
    - add comments
    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    77f6976 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ecc466a View commit details
    Browse the repository at this point in the history
  4. metal : improve precision

    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    3a428a1 View commit details
    Browse the repository at this point in the history
  5. ggml : fix f16 mad

    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    8612864 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    0ad44ba View commit details
    Browse the repository at this point in the history
  7. metal : minor

    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    134c81c View commit details
    Browse the repository at this point in the history
  8. metal : support Q > 8

    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    1db22d7 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2024

  1. tests : add ATTN tests

    ggerganov committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    4794821 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    abeaf0d View commit details
    Browse the repository at this point in the history
  3. tests : more

    ggerganov committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    c6c1132 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5fcb9c1 View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2024

  1. Configuration menu
    Copy the full SHA
    d073e4f View commit details
    Browse the repository at this point in the history
  2. tests : ifdef

    ggerganov committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    78df552 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3d03bcb View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2024

  1. Configuration menu
    Copy the full SHA
    2ddc9bb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8ad92dc View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2024

  1. Configuration menu
    Copy the full SHA
    910b15b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2e46013 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    5a19a9f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    41d136b View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    56e45a2 View commit details
    Browse the repository at this point in the history
  6. metal : optimize softmax

    ggerganov committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    cda5a60 View commit details
    Browse the repository at this point in the history
  7. tests : minor fix

    ggerganov committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    c6769b9 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    db1f3c4 View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2024

  1. tests : update dims

    ggerganov committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    12eaa22 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b68a112 View commit details
    Browse the repository at this point in the history

Commits on Feb 3, 2024

  1. Configuration menu
    Copy the full SHA
    b150abe View commit details
    Browse the repository at this point in the history
  2. cuda : use int instead of int64_t

    Noticeably improves performance (thanks to Johannes)
    ggerganov committed Feb 3, 2024
    Configuration menu
    Copy the full SHA
    7c34655 View commit details
    Browse the repository at this point in the history
  3. cuda : make loops use the same loop values

    Thanks Johannes again for the tip
    ggerganov committed Feb 3, 2024
    Configuration menu
    Copy the full SHA
    1f8a592 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    92472ea View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    c51f27c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    b958151 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    a7b4715 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    3b1c4e7 View commit details
    Browse the repository at this point in the history
  9. cuda : unroll Q*K^T loop

    ggerganov committed Feb 3, 2024
    Configuration menu
    Copy the full SHA
    5b263dd View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    e04ff39 View commit details
    Browse the repository at this point in the history
  11. cuda : simplify softmax

    ggerganov committed Feb 3, 2024
    Configuration menu
    Copy the full SHA
    cfd9732 View commit details
    Browse the repository at this point in the history
  12. cuda : fix matrix names

    ggerganov committed Feb 3, 2024
    Configuration menu
    Copy the full SHA
    ef68fac View commit details
    Browse the repository at this point in the history

Commits on Feb 4, 2024

  1. cuda : minor

    ggerganov committed Feb 4, 2024
    Configuration menu
    Copy the full SHA
    1846e92 View commit details
    Browse the repository at this point in the history

Commits on Feb 12, 2024

  1. Configuration menu
    Copy the full SHA
    6875997 View commit details
    Browse the repository at this point in the history

Commits on Feb 19, 2024

  1. Configuration menu
    Copy the full SHA
    31109ca View commit details
    Browse the repository at this point in the history
  2. llama : adapt to F16 KQ_pos

    ggerganov committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    f249c99 View commit details
    Browse the repository at this point in the history

Commits on Mar 3, 2024

  1. Configuration menu
    Copy the full SHA
    02a645e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6aefd11 View commit details
    Browse the repository at this point in the history

Commits on Mar 4, 2024

  1. Configuration menu
    Copy the full SHA
    e307882 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    58c7f61 View commit details
    Browse the repository at this point in the history

Commits on Mar 22, 2024

  1. Configuration menu
    Copy the full SHA
    9495d39 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3a468e6 View commit details
    Browse the repository at this point in the history
  3. ggml : fix CPU soft_max

    ggerganov committed Mar 22, 2024
    Configuration menu
    Copy the full SHA
    0953212 View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2024

  1. tests : add hs=256

    ggerganov committed Mar 24, 2024
    Configuration menu
    Copy the full SHA
    e425810 View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2024

  1. Configuration menu
    Copy the full SHA
    013721d View commit details
    Browse the repository at this point in the history
  2. cuda : fix build

    ggerganov committed Mar 27, 2024
    Configuration menu
    Copy the full SHA
    6be02b5 View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2024

  1. Configuration menu
    Copy the full SHA
    57c03b7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3e318e7 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    08e69c5 View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2024

  1. Configuration menu
    Copy the full SHA
    75aa7b4 View commit details
    Browse the repository at this point in the history
  2. 16 cols for Phi-2

    JohannesGaessler authored and ggerganov committed Apr 2, 2024
    Configuration menu
    Copy the full SHA
    d59ac67 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    81da919 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    269374e View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    cca6d02 View commit details
    Browse the repository at this point in the history
  6. no ncols == 64

    JohannesGaessler authored and ggerganov committed Apr 2, 2024
    Configuration menu
    Copy the full SHA
    68d793b View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    3f777ac View commit details
    Browse the repository at this point in the history
  8. fix compile warnings

    JohannesGaessler authored and ggerganov committed Apr 2, 2024
    Configuration menu
    Copy the full SHA
    e1ecd3b View commit details
    Browse the repository at this point in the history
  9. fix excessive KQ_b loads

    JohannesGaessler authored and ggerganov committed Apr 2, 2024
    Configuration menu
    Copy the full SHA
    bb0d51a View commit details
    Browse the repository at this point in the history
  10. fix cmake build

    JohannesGaessler authored and ggerganov committed Apr 2, 2024
    Configuration menu
    Copy the full SHA
    c63dfdf View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    ee19a4a View commit details
    Browse the repository at this point in the history

Commits on Apr 5, 2024

  1. Configuration menu
    Copy the full SHA
    89961de View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2024

  1. Configuration menu
    Copy the full SHA
    2c41180 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    599ce84 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4053857 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5668c79 View commit details
    Browse the repository at this point in the history

Commits on Apr 18, 2024

  1. Configuration menu
    Copy the full SHA
    34f93bb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6a3b842 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ef9e159 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a5b0e2d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    0bc67dd View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2f538b9 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    87968de View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    260cdb2 View commit details
    Browse the repository at this point in the history
  9. metal : add BS=1 kernel for flash attention (#6508)

    * metal : add BS=1 kernel for flash attention (wip)
    
    * metal : support more than 1 warps
    
    * metal : opts
    
    * metal : opt
    
    * metal : switch to parallel reduce
    
    * metal : reduce registers
    
    * metal : simplify
    
    * metal : initial FA vec kernel
    ggerganov committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    105332c View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    fa9e8c6 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    c16a7c2 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    9ca8698 View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2024

  1. llama : simplify llama_build_kv_store

    ggml-ci
    ggerganov committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    74d57f9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1db66c1 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e32b281 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    703c6e6 View commit details
    Browse the repository at this point in the history
  5. metal : clean-up

    ggerganov committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    97eaece View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    1a88565 View commit details
    Browse the repository at this point in the history
  7. metal : minor

    ggerganov committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    bc34616 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    29f6ad8 View commit details
    Browse the repository at this point in the history
  9. tests : remove benchmarks

    ggml-ci
    ggerganov committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    5294542 View commit details
    Browse the repository at this point in the history
  10. ggml : fix avx512 const correctness

    ggml-ci
    ggerganov committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    3badef1 View commit details
    Browse the repository at this point in the history
  11. ggml : fix soft_max with bias on CPU

    ggml-ci
    ggerganov committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    871fcb6 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2024

  1. Configuration menu
    Copy the full SHA
    a39217d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cb76d74 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c11d05f View commit details
    Browse the repository at this point in the history
  4. ggml : ggml_soft_max support F16/F32 mask/pos

    ggml-ci
    ggerganov committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    f725ca9 View commit details
    Browse the repository at this point in the history
  5. cuda : uint -> uint32_t

    ggerganov committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    5408d55 View commit details
    Browse the repository at this point in the history
  6. cuda : "constexpr dim3" -> "const dim3"

    ggml-ci
    ggerganov committed Apr 22, 2024
    Configuration menu
    Copy the full SHA
    c70bfd7 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2024

  1. cuda : try to fix __hgt2_mask

    ggml-ci
    ggerganov committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    c129369 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3864eea View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    78d363b View commit details
    Browse the repository at this point in the history
  4. llama : prep ALiBi support for BERT models

    ggml-ci
    ggerganov committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    19e8982 View commit details
    Browse the repository at this point in the history
  5. llama : fix n_batch requirements

    ggml-ci
    ggerganov committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    56657e5 View commit details
    Browse the repository at this point in the history
  6. cont

    ggerganov committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    d228bf8 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    751591d View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2024

  1. Merge branch 'master' into gg/flash-attn

    ggml-ci
    ggerganov committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    8937ec5 View commit details
    Browse the repository at this point in the history
  2. llama : disable FA for AMD

    ggerganov committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    ce281b9 View commit details
    Browse the repository at this point in the history

Commits on Apr 25, 2024

  1. Configuration menu
    Copy the full SHA
    1f77f49 View commit details
    Browse the repository at this point in the history
  2. tests : remove TMP_ATTN_BENCH

    ggml-ci
    ggerganov committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    ff2c64a View commit details
    Browse the repository at this point in the history
  3. Merge branch 'master' into gg/flash-attn

    ggml-ci
    ggerganov committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    cb3547a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1fd5bc3 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    09d0381 View commit details
    Browse the repository at this point in the history
  6. ci : add CUDA save-load-state tests

    ggml-ci
    ggerganov committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    ac1c6d9 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    c225609 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    bab346b View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    0fc5c5e View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    1e590ac View commit details
    Browse the repository at this point in the history
  11. metal : remove tmp log

    ggerganov committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    4f4c024 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    9e38760 View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2024

  1. Merge branch 'master' into gg/flash-attn

    ggml-ci
    ggerganov committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    a1616e9 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'master' into gg/flash-attn

    ggml-ci
    ggerganov committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    ca0275c View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2024

  1. metal : fix max nsg

    ggml-ci
    ggerganov committed Apr 30, 2024
    Configuration menu
    Copy the full SHA
    e180fcd View commit details
    Browse the repository at this point in the history
  2. ci : fix arg order

    ggml-ci
    ggerganov committed Apr 30, 2024
    Configuration menu
    Copy the full SHA
    c240ae2 View commit details
    Browse the repository at this point in the history