PR #13603: NVTX: name threads, CUDA devices and CUDA streams #2337
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR #13603: NVTX: name threads, CUDA devices and CUDA streams
Imported from GitHub PR openxla/xla#13603
This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.
Device names:
![Screenshot 2024-06-10 at 14 52 37](https://private-user-images.githubusercontent.com/6459623/338500883-d889d37e-ca2e-4f5e-b5bd-240bbb625b4c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3OTIwMDgsIm5iZiI6MTcxOTc5MTcwOCwicGF0aCI6Ii82NDU5NjIzLzMzODUwMDg4My1kODg5ZDM3ZS1jYTJlLTRmNWUtYjViZC0yNDBiYmI2MjViNGMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MzBUMjM1NTA4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MzExNTQyNTFmZmU5OWI3ODEzNGU5MWYxNzc2ZTU4ZDljYTVlZGY3OTc2OWVhYTg5NmZmZTVlOGNmZWE4MTk5MCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.NGfhl_8kcvSfHBZtBCvj1GcNI4zmQO-o1Ns5003xB1I)
Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://private-user-images.githubusercontent.com/6459623/338500965-4bfc4ffa-8fdf-4b93-b23e-95bf056799f3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3OTIwMDgsIm5iZiI6MTcxOTc5MTcwOCwicGF0aCI6Ii82NDU5NjIzLzMzODUwMDk2NS00YmZjNGZmYS04ZmRmLTRiOTMtYjIzZS05NWJmMDU2Nzk5ZjMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MzBUMjM1NTA4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MGUzZjYyMjI1NTU5M2U4NzU0OGQzOWJiMjg5YWZmYmQ5MDg5OWYxYmU5OTUxOThjOTU5YTU2ZTY1ZjkxYTVjMSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.clRMTNeHYtJsysolEI1nh8wGuDz6tweUafPX07_NkZM)
Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://private-user-images.githubusercontent.com/6459623/338501026-8852ca9e-f2f4-4a45-8334-a18f8ab5ce18.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk3OTIwMDgsIm5iZiI6MTcxOTc5MTcwOCwicGF0aCI6Ii82NDU5NjIzLzMzODUwMTAyNi04ODUyY2E5ZS1mMmY0LTRhNDUtODMzNC1hMThmOGFiNWNlMTgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYzMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MzBUMjM1NTA4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Yzg0MTQ4ODcxNDJlYTU0ZTVmMjkzODY1YWVkYmM3MmQzNjhmYWUwNTNlZjJmZGUyMTUzYjQzOWY1N2FiYWVhNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.IDn7vV7i72e1EL8VAUNitZfYftowBhrDuAIc3Jv_a4I)
This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:
--
12a02b67bd9db8b3f69ba1e0d00c7881f767f037 by Olli Lupton olupton@nvidia.com:
NVTX: name threads, CUDA devices and CUDA streams
--
bdf8dbf7700cbe7ce72070c25ce3d21e2dfeb54f by Olli Lupton olupton@nvidia.com:
Add missing header
--
98a80a40add79f108cb89987724c35f82cd727e4 by Olli Lupton olupton@nvidia.com:
add stubs
Merging this change closes #13603
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 98a80a40add79f108cb89987724c35f82cd727e4