Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Add GPU kernels and enable LLaMA model. #372

Merged
merged 36 commits into from
Jun 14, 2024

Conversation

changqi1
Copy link
Contributor

@changqi1 changqi1 commented May 7, 2024

How to build and test w/ single Intel GPU

source /opt/intel/oneapi/setvars.sh
source ../3rdparty/oneccl/build/_install/env/setvars.sh

export CC=icx
export CXX=icpx
mkdir build && cd build
cmake .. -DWITH_GPU=ON
make -j

OMP_NUM_THREADS=20 mpirun -n 1 -env XFT_ENGINE=GPU:0 numactl -N 0 -m 0 ./example \
        --model /home/xfast/models/llama-2-7b-chat-xft/ \
        --token /home/xfast/models/llama-2-7b-chat-hf/tokenizer.model \
        --dtype fp16 \
        --loop 3 \
        --no_stream \
        --input_len 18 \
        --output_len 8

[INFO] First token time: 69.575 ms
[INFO] Second token time: 61.2469 ms
[INFO] Final output is:
==============================================
Once upon a time, there existed a little girl who liked to have adventures. She lived in a small village surrounded by

@changqi1 changqi1 added enhancement New feature or request gpu Related to GPU labels May 7, 2024
@changqi1 changqi1 marked this pull request as draft May 7, 2024 13:21
@changqi1 changqi1 marked this pull request as ready for review May 14, 2024 03:41
src/common/allocator.h Outdated Show resolved Hide resolved
src/layers/attention.h Outdated Show resolved Hide resolved
src/models/model_factory.h Outdated Show resolved Hide resolved
src/layers/mlp_llama.h Outdated Show resolved Hide resolved
src/utils/matmul_helper.h Outdated Show resolved Hide resolved
src/common/sequence.h Outdated Show resolved Hide resolved
src/layers/attention.h Outdated Show resolved Hide resolved
src/layers/layer_norm.cpp Show resolved Hide resolved
src/layers/mlp_llama.h Outdated Show resolved Hide resolved
src/layers/rms_norm.cpp Outdated Show resolved Hide resolved
src/models/llama.cpp Show resolved Hide resolved
tests/ut/attention_kernels_test.cpp Show resolved Hide resolved
src/utils/matmul_helper.h Show resolved Hide resolved

if (postAlg == matmul_kinds::Basic) {
if (bias != nullptr)
matmul_pd = new matmul::primitive_desc(*engine, input_md, weight_md, bias_md, output_md);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted the object or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete in destructor

@changqi1 changqi1 changed the title [Kernel] Add GPU kernels. [Kernel] Add GPU kernels and enable LLaMA model. Jun 14, 2024
@changqi1 changqi1 merged commit 80df391 into intel:main Jun 14, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gpu Related to GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants