add numa to improve cpu inference perf #2330

sywangyi · 2024-07-30T07:08:59Z

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

sywangyi · 2024-07-30T07:32:45Z

15% oob improvement in time_per_token in Intel(R) Xeon(R) Platinum 8480L for 1 process(meta-llama/Meta-Llama-3.1-8B-Instruct)
500% oob improvement in time_per_token in Intel(R) Xeon(R) Platinum 8480L for 2TP (meta-llama/Meta-Llama-3.1-8B-Instruct)

sywangyi · 2024-07-30T07:39:17Z

@yao-matrix

sywangyi · 2024-07-30T07:56:53Z

@danieldk please help review the PR. thanks

ErikKaum · 2024-08-13T13:12:21Z

Hi @sywangyi 👋

Sorry that it's taken so long for anyone to respond. We'll try to get to this as fast as possible 🙌

server/text_generation_server/models/flash_causal_lm.py

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

add numa to improve cpu inference perf

67c0b5e

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Narsil merged commit 59922f9 into huggingface:main Aug 13, 2024

rbrugaro reviewed Aug 20, 2024

View reviewed changes

server/text_generation_server/models/flash_causal_lm.py Show resolved Hide resolved

yuanwu2017 pushed a commit to yuanwu2017/tgi-gaudi that referenced this pull request Sep 26, 2024

add numa to improve cpu inference perf (huggingface#2330)

7a4d831

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add numa to improve cpu inference perf #2330

add numa to improve cpu inference perf #2330

sywangyi commented Jul 30, 2024

sywangyi commented Jul 30, 2024 •

edited

Loading

sywangyi commented Jul 30, 2024

sywangyi commented Jul 30, 2024

ErikKaum commented Aug 13, 2024

add numa to improve cpu inference perf #2330

add numa to improve cpu inference perf #2330

Conversation

sywangyi commented Jul 30, 2024

sywangyi commented Jul 30, 2024 • edited Loading

sywangyi commented Jul 30, 2024

sywangyi commented Jul 30, 2024

ErikKaum commented Aug 13, 2024

sywangyi commented Jul 30, 2024 •

edited

Loading