How to specify which GPU the model inference on? #352

zoubaihan · 2023-07-04T06:14:33Z

Hello, I have 4 GPUs. And when I set tensor_parallel_size as 2, when running the service, it would takes CUDA:0 and CUDA:1.

My question is, if I want start two workers(i.e. two process that deploy two same models), how to specify my second process takes on CUDA:2 and CUDA:3?

Cuz now if I just start service without any config, it will OOM.

The text was updated successfully, but these errors were encountered:

MasKong · 2023-07-04T06:21:06Z

NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.api_server --tensor-parallel-size 2 --host 127.0.0.1

MM-IR · 2023-07-15T12:18:50Z

I have one question on my servers. It seems that when cuda:0 is almost full, it still fail to do so, by passing the parameters "CUDA_VISIBLE_DEVICES"?

MM-IR · 2023-07-15T12:39:26Z

Oh, I find that they are still taking the first two GPUs by ray::worker when I specify other two.

humza-sami · 2024-02-23T19:31:45Z

NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.api_server --tensor-parallel-size 2 --host 127.0.0.1

@MasKong Can you bit elaborate this?
This is my simple codebase and I want to use 1 and 3 gpus.

llm = LLM(model_name, max_model_len=50, tensor_parallel_size=2)
output = llm.generate(text)

You can find complete issue here

hmellor · 2024-03-20T12:41:37Z

Closing in preference to #3012

zhuohan123 mentioned this issue Jul 18, 2023

Ray placement group support #397

Merged

zhuohan123 added the feature request label Jul 18, 2023

zhuohan123 mentioned this issue Jul 16, 2023

[Deprecated] vLLM Development Roadmap #244

Closed

76 tasks

hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to specify which GPU the model inference on? #352

How to specify which GPU the model inference on? #352

zoubaihan commented Jul 4, 2023

MasKong commented Jul 4, 2023

MM-IR commented Jul 15, 2023

MM-IR commented Jul 15, 2023 •

edited

Loading

humza-sami commented Feb 23, 2024

hmellor commented Mar 20, 2024

How to specify which GPU the model inference on? #352

How to specify which GPU the model inference on? #352

Comments

zoubaihan commented Jul 4, 2023

MasKong commented Jul 4, 2023

MM-IR commented Jul 15, 2023

MM-IR commented Jul 15, 2023 • edited Loading

humza-sami commented Feb 23, 2024

hmellor commented Mar 20, 2024

MM-IR commented Jul 15, 2023 •

edited

Loading