Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to specify which GPU the model inference on? #352

Closed
zoubaihan opened this issue Jul 4, 2023 · 5 comments
Closed

How to specify which GPU the model inference on? #352

zoubaihan opened this issue Jul 4, 2023 · 5 comments

Comments

@zoubaihan
Copy link

Hello, I have 4 GPUs. And when I set tensor_parallel_size as 2, when running the service, it would takes CUDA:0 and CUDA:1.

My question is, if I want start two workers(i.e. two process that deploy two same models), how to specify my second process takes on CUDA:2 and CUDA:3?

Cuz now if I just start service without any config, it will OOM.

@MasKong
Copy link

MasKong commented Jul 4, 2023

NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.api_server --tensor-parallel-size 2 --host 127.0.0.1

@MM-IR
Copy link

MM-IR commented Jul 15, 2023

I have one question on my servers. It seems that when cuda:0 is almost full, it still fail to do so, by passing the parameters "CUDA_VISIBLE_DEVICES"?

@MM-IR
Copy link

MM-IR commented Jul 15, 2023

Oh, I find that they are still taking the first two GPUs by ray::worker when I specify other two.

@humza-sami
Copy link

NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.api_server --tensor-parallel-size 2 --host 127.0.0.1

@MasKong Can you bit elaborate this?
This is my simple codebase and I want to use 1 and 3 gpus.

llm = LLM(model_name, max_model_len=50, tensor_parallel_size=2)
output = llm.generate(text)

You can find complete issue here

@hmellor
Copy link
Collaborator

hmellor commented Mar 20, 2024

Closing in preference to #3012

@hmellor hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants