NVIDIA GPU Support #11

edicristofaro · 2023-08-17T01:45:16Z

This is a quick PR to show how NVIDIA GPU support would work. You may not want to merge this since it also removes the model download steps and presumes you already have them, but it might serve as a good baseline. This also presumes that you've configured Docker to work with GPUs (see here: https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/).

My setup is:

AMD Ryzen 5600x
64 GB RAM
NVIDIA 3080 10GB GPU
WSL2 with Ubuntu 20.04.6 LTS
NVIDIA-SMI 535.86.10
NVIDIA Driver Version: 536.99
CUDA Version: 12.2

I'm able to run the 13B and 70B models, albeit slowly for the latter, with some offload to the GPU.

… in the Dockerfiles

edicristofaro · 2023-08-17T01:50:52Z

@mayankchhabra I'm leaving this in draft mode since merging it would break model downloads. But this is working for me and wanted to put it out there in case this was helpful in getting GPU support going for anyone else.

notocwilson · 2023-08-17T16:53:00Z

This will likely need to be rebased once work to implement the changes outlined in #8 (comment) is complete.

edicristofaro · 2023-08-17T19:23:57Z

Absolutely. I can rework this depending on what happens there around model downloading, the storage path, etc. I think it'd make a lot of sense to pass in the model to run and the number of GPU layers to offload as commandline params/env vars, but I'll hold off until #8 is resolved.

edgar971 · 2023-08-18T02:53:27Z

I have an open PR #19 to solve #8 😄 .

edicristofaro · 2023-08-18T04:05:01Z

Reopening this. I'll continue to push changes/rebase once some of the more ergonomic PRs are merged (e.g. model downloads).

mayankchhabra · 2023-08-21T11:37:03Z

Thanks for taking this on @edgar971! Now that #19 has been merged, would you like to work on this? Here's another helpful comment from a user running it with CUDA support: #6 (comment)

I think the easiest way to get going with CUDA support could be to create separate docker-compose-7b-cuda.yml, docker-compose-13b-cuda.yml, docker-compose-70b-cuda.yml and api/run-cuda.sh files.

edgar971 · 2023-08-21T14:13:43Z

Thanks for taking this on @edgar971! Now that #19 has been merged, would you like to work on this? Here's another helpful comment from a user running it with CUDA support: #6 (comment)

I think the easiest way to get going with CUDA support could be to create separate docker-compose-7b-cuda.yml, docker-compose-13b-cuda.yml, docker-compose-70b-cuda.yml and api/run-cuda.sh files.

We can probably use the same api/run.sh and set the N_GPU_LAYERS env variable in the docker-compose.

jjgraham · 2023-08-21T16:29:33Z

Can i use the N_GPU_LAYERS for the kubernetes api deployment ?
Editing the api/run.sh did nothing.

edicristofaro · 2023-08-21T16:31:12Z

I think a decision to be made here would be whether you want to base the whole project off of the nvidia:cuda docker images, or if you prefer to have separate dockerfiles for cuda support. Last I checked, the llama-cpp-python project supports GPU offload, but their GHCR docker image does not. So you'd be changing base images for the project here, or implement some kind of conditional to select the correct dockerfile based off of some input/env var.

mayankchhabra · 2023-08-22T17:29:56Z

For the sake of simplicity, I think using separate Dockerfile and docker-compose files for CUDA would be great. We can then add relevant instructions to the readme. A comprehensive refactor down the line can combine everything into a simple run.sh script that uses template docker-compose files based on system config.

mayankchhabra · 2023-08-27T12:45:41Z

Thanks for helping kickstart the effort on this @edicristofaro! We were able to add CUDA support with #72. Closing this PR now. Cheers!

jjgraham · 2023-08-27T17:19:20Z

I don't see how this will work for Kubernetes deployments...
Do you have a plan to make GPU support work in Kubernetes.

edicristofaro added 3 commits August 16, 2023 18:26

initial commit, working with 3080 10GB, albeit slowly

a4926b8

Some cleanup, set a parameter for the number of GPU layers to offload…

8daac2e

… in the Dockerfiles

Backing off 70B GPU layers to fit in 10GB VRAM

b676aa4

Offloading all of 7B to GPU

e6c52d0

edicristofaro closed this Aug 18, 2023

edicristofaro reopened this Aug 18, 2023

edicristofaro added 2 commits August 21, 2023 09:32

some flattening

cfdc121

No more priviliged runtime

19e95ce

mayankchhabra mentioned this pull request Aug 27, 2023

Add CUDA support for Nvidia GPUs #72

Merged

mayankchhabra closed this Aug 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA GPU Support #11

NVIDIA GPU Support #11

edicristofaro commented Aug 17, 2023

edicristofaro commented Aug 17, 2023

notocwilson commented Aug 17, 2023

edicristofaro commented Aug 17, 2023

edgar971 commented Aug 18, 2023

edicristofaro commented Aug 18, 2023

mayankchhabra commented Aug 21, 2023

edgar971 commented Aug 21, 2023

jjgraham commented Aug 21, 2023

edicristofaro commented Aug 21, 2023

mayankchhabra commented Aug 22, 2023

mayankchhabra commented Aug 27, 2023

jjgraham commented Aug 27, 2023

NVIDIA GPU Support #11

NVIDIA GPU Support #11

Conversation

edicristofaro commented Aug 17, 2023

edicristofaro commented Aug 17, 2023

notocwilson commented Aug 17, 2023

edicristofaro commented Aug 17, 2023

edgar971 commented Aug 18, 2023

edicristofaro commented Aug 18, 2023

mayankchhabra commented Aug 21, 2023

edgar971 commented Aug 21, 2023

jjgraham commented Aug 21, 2023

edicristofaro commented Aug 21, 2023

mayankchhabra commented Aug 22, 2023

mayankchhabra commented Aug 27, 2023

jjgraham commented Aug 27, 2023