Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA GPU Support #11

Closed

Conversation

edicristofaro
Copy link

This is a quick PR to show how NVIDIA GPU support would work. You may not want to merge this since it also removes the model download steps and presumes you already have them, but it might serve as a good baseline. This also presumes that you've configured Docker to work with GPUs (see here: https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/).

My setup is:

  • AMD Ryzen 5600x
  • 64 GB RAM
  • NVIDIA 3080 10GB GPU
  • WSL2 with Ubuntu 20.04.6 LTS
  • NVIDIA-SMI 535.86.10
  • NVIDIA Driver Version: 536.99
  • CUDA Version: 12.2

I'm able to run the 13B and 70B models, albeit slowly for the latter, with some offload to the GPU.

@edicristofaro
Copy link
Author

@mayankchhabra I'm leaving this in draft mode since merging it would break model downloads. But this is working for me and wanted to put it out there in case this was helpful in getting GPU support going for anyone else.

@notocwilson
Copy link

This will likely need to be rebased once work to implement the changes outlined in #8 (comment) is complete.

@edicristofaro
Copy link
Author

Absolutely. I can rework this depending on what happens there around model downloading, the storage path, etc. I think it'd make a lot of sense to pass in the model to run and the number of GPU layers to offload as commandline params/env vars, but I'll hold off until #8 is resolved.

@edgar971
Copy link
Contributor

I have an open PR #19 to solve #8 😄 .

@edicristofaro
Copy link
Author

Reopening this. I'll continue to push changes/rebase once some of the more ergonomic PRs are merged (e.g. model downloads).

@mayankchhabra
Copy link
Member

Thanks for taking this on @edgar971! Now that #19 has been merged, would you like to work on this? Here's another helpful comment from a user running it with CUDA support: #6 (comment)

I think the easiest way to get going with CUDA support could be to create separate docker-compose-7b-cuda.yml, docker-compose-13b-cuda.yml, docker-compose-70b-cuda.yml and api/run-cuda.sh files.

@edgar971
Copy link
Contributor

Thanks for taking this on @edgar971! Now that #19 has been merged, would you like to work on this? Here's another helpful comment from a user running it with CUDA support: #6 (comment)

I think the easiest way to get going with CUDA support could be to create separate docker-compose-7b-cuda.yml, docker-compose-13b-cuda.yml, docker-compose-70b-cuda.yml and api/run-cuda.sh files.

We can probably use the same api/run.sh and set the N_GPU_LAYERS env variable in the docker-compose.

@jjgraham
Copy link

Can i use the N_GPU_LAYERS for the kubernetes api deployment ?
Editing the api/run.sh did nothing.

@edicristofaro
Copy link
Author

I think a decision to be made here would be whether you want to base the whole project off of the nvidia:cuda docker images, or if you prefer to have separate dockerfiles for cuda support. Last I checked, the llama-cpp-python project supports GPU offload, but their GHCR docker image does not. So you'd be changing base images for the project here, or implement some kind of conditional to select the correct dockerfile based off of some input/env var.

@mayankchhabra
Copy link
Member

For the sake of simplicity, I think using separate Dockerfile and docker-compose files for CUDA would be great. We can then add relevant instructions to the readme. A comprehensive refactor down the line can combine everything into a simple run.sh script that uses template docker-compose files based on system config.

@mayankchhabra
Copy link
Member

Thanks for helping kickstart the effort on this @edicristofaro! We were able to add CUDA support with #72. Closing this PR now. Cheers!

@jjgraham
Copy link

I don't see how this will work for Kubernetes deployments...
Do you have a plan to make GPU support work in Kubernetes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants