Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TRT-LLM Multigpu Compatibility #2837

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

nik-mosaic
Copy link
Contributor

@nik-mosaic nik-mosaic commented Jan 11, 2024

[Wip] What does this PR do?

We need to use Composer to run our evaluation framework on TRT-LLM models. Unfortunately, this breaks in the Multi-GPU case. These fixes allow Composer to run N copies in parallel and feed data in a way that works with multi-gpu TRT-LLM models. Essentially, these changes are (a) not initializing dist and (b) fixing some race conditions related to data loading.

TODO:

  • Replace commented out code with parameters we can pass in.

@mvpatel2000
Copy link
Contributor

@nik-mosaic is this still relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants