Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi gpus capacity for transcription #114

Merged
merged 33 commits into from
Jul 5, 2023
Merged

Add multi gpus capacity for transcription #114

merged 33 commits into from
Jul 5, 2023

Conversation

chainyo
Copy link
Contributor

@chainyo chainyo commented Jun 22, 2023

This PR:

  • Remove bad async design and blocking operations.
  • Allow multi-GPUs for transcription

WARNING: NEEDS TO BE DONE

  • We should handle multi-GPU cases when we have to load a specific whisper model for extra languages. I suggest introducing a sort of index for that will load the model and unload the model only for 1 GPU, and not all the GPUs...
  • Nemo is the bottleneck atm and it should also use an index to load multiple diarization models

@chainyo chainyo added api Everything related to the API implementation transcription Everything related to the transcription part labels Jun 22, 2023
@chainyo chainyo self-assigned this Jun 22, 2023
@chainyo chainyo marked this pull request as draft June 22, 2023 18:33
@chainyo
Copy link
Contributor Author

chainyo commented Jun 26, 2023

I implemented a GPU index for each service based on the GPUService class that handles the acquisition and the release of the GPU (to avoid 2 jobs trying to use the same GPU at the exact same moment).

I'm actually facing a weird issue with ctranslate2 inference: RuntimeError: CUDA: invalid arguments

Aleks and others added 19 commits June 29, 2023 01:58
…d-error-reporting

Updated error payload for svix in cortex endpoint
update the download_audio function to avoid extension problems
Add `audio_duration` key in API response
* add a catch for empty audio file

* rename utterances -> response for coherence

* fix quality
* add vocab feature

* fix youtube endpoint

* update the prompt sentence

* add vocab feature

* fix youtube endpoint

* update the prompt sentence
Upgraded docker image to nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04
@chainyo chainyo marked this pull request as ready for review July 5, 2023 14:23
@chainyo chainyo merged commit 2a17376 into main Jul 5, 2023
1 check failed
@chainyo chainyo deleted the 4gpus branch July 5, 2023 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Everything related to the API implementation transcription Everything related to the transcription part
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants