[Kernel] Add GPU kernels and enable LLaMA model. #372

changqi1 · 2024-05-07T10:08:21Z

How to build and test w/ single Intel GPU

source /opt/intel/oneapi/setvars.sh
source ../3rdparty/oneccl/build/_install/env/setvars.sh

export CC=icx
export CXX=icpx
mkdir build && cd build
cmake .. -DWITH_GPU=ON
make -j

OMP_NUM_THREADS=20 mpirun -n 1 -env XFT_ENGINE=GPU:0 numactl -N 0 -m 0 ./example \
        --model /home/xfast/models/llama-2-7b-chat-xft/ \
        --token /home/xfast/models/llama-2-7b-chat-hf/tokenizer.model \
        --dtype fp16 \
        --loop 3 \
        --no_stream \
        --input_len 18 \
        --output_len 8

[INFO] First token time: 69.575 ms
[INFO] Second token time: 61.2469 ms
[INFO] Final output is:
==============================================
Once upon a time, there existed a little girl who liked to have adventures. She lived in a small village surrounded by

src/common/allocator.h

src/layers/attention.h

src/models/model_factory.h

src/layers/mlp_llama.h

src/utils/matmul_helper.h

src/common/sequence.h

src/layers/attention.h

src/layers/layer_norm.cpp

src/layers/mlp_llama.h

src/layers/rms_norm.cpp

src/models/llama.cpp

tests/ut/attention_kernels_test.cpp

src/utils/matmul_helper.h

pujiang2018 · 2024-06-06T07:08:16Z

src/utils/matmul_helper.h

+
+            if (postAlg == matmul_kinds::Basic) {
+                if (bias != nullptr)
+                    matmul_pd = new matmul::primitive_desc(*engine, input_md, weight_md, bias_md, output_md);


deleted the object or not?

delete in destructor

changqi1 added enhancement New feature or request gpu Related to GPU labels May 7, 2024

changqi1 requested review from pujiang2018, Duyi-Wang and abenmao May 7, 2024 10:09

changqi1 marked this pull request as draft May 7, 2024 13:21

changqi1 added 3 commits May 7, 2024 18:07

[Kernel] Add GPU kernels.

aff6786

format code

0b1be0e

fix running issue.

3ef6143

changqi1 marked this pull request as ready for review May 14, 2024 03:41

pujiang2018 reviewed May 15, 2024

View reviewed changes

changqi1 added 13 commits May 15, 2024 20:57

Add RmsNorm kernel.

619e788

Fix some issues.

39ec0b4

Optimze alloc

2ef7f7a

Use unified onednn engine

29f01c1

merge from main

532445b

Fix compile

17f224e

Fix onednn gemm issue

b61fe52

Fix build

c5b7ac7

Add fp16 rope kernels

0faa9de

Fix attention UT issue.

3cfa650

Fix ICX build issue.

881bc78

Merge branch 'main' into changqing/feature/gpu_rope

c958a1d

Fix build.

9b2af7e

pujiang2018 reviewed Jun 6, 2024

View reviewed changes

changqi1 added 5 commits June 7, 2024 15:09

Add rmsNorm impl and XFT_DEBUG

c034849

Update.

ec463a3

update.

69dd33a

Add GPU memory to run kernels.

5f43897

Add gpu matmul kernels

23d2053

changqi1 added 12 commits June 13, 2024 10:14

Fix CPU build issue.

b7dc9eb

fix

daec9dd

fix

c3e83f2

fix

277de9b

fix

dd1d3fb

Fix build issue.

15fc202

Fix build issue.

726d356

Fix LN bug

003c46b

Fix final LN

4cb98cf

Fix 2

6a85769

Fix 3

f6e6e64

Done

5f93020

changqi1 changed the title ~~[Kernel] Add GPU kernels.~~ [Kernel] Add GPU kernels and enable LLaMA model. Jun 14, 2024

pujiang2018 approved these changes Jun 14, 2024

View reviewed changes

changqi1 merged commit 80df391 into intel:main Jun 14, 2024
1 check passed

changqi1 added 3 commits June 14, 2024 09:08

Finish

175c4dc

change macro GPU to XFT_GPU

8d35cfc

Add requirements-gpu.txt

ea1679d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Add GPU kernels and enable LLaMA model. #372

[Kernel] Add GPU kernels and enable LLaMA model. #372

changqi1 commented May 7, 2024 •

edited

Loading

pujiang2018 Jun 6, 2024

changqi1 Jun 7, 2024

[Kernel] Add GPU kernels and enable LLaMA model. #372

[Kernel] Add GPU kernels and enable LLaMA model. #372

Conversation

changqi1 commented May 7, 2024 • edited Loading

How to build and test w/ single Intel GPU

pujiang2018 Jun 6, 2024

Choose a reason for hiding this comment

changqi1 Jun 7, 2024

Choose a reason for hiding this comment

changqi1 commented May 7, 2024 •

edited

Loading