Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVTX: name threads, CUDA devices and CUDA streams #13603

Closed

Conversation

olupton
Copy link
Contributor

@olupton olupton commented Jun 11, 2024

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
Screenshot 2024-06-10 at 14 52 37

Stream names:
Screenshot 2024-06-10 at 14 53 25

Thread names:
Screenshot 2024-06-10 at 14 54 04

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label Jun 11, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Jun 11, 2024
@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label Jun 11, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Jun 11, 2024
@github-actions github-actions bot added the kokoro:force-run Forces CI to rerun label Jun 11, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Forces CI to rerun label Jun 11, 2024
@golechwierowicz
Copy link
Member

@cheshire can you take a look? I'm not super familiar with nsys profiler, and cannot tell whether this improves internal profiling experience or not.

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 14, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b67bd9db8b3f69ba1e0d00c7881f767f037 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf7700cbe7ce72070c25ce3d21e2dfeb54f by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a40add79f108cb89987724c35f82cd727e4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 98a80a40add79f108cb89987724c35f82cd727e4
PiperOrigin-RevId: 643001157
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 14, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b67bd9db8b3f69ba1e0d00c7881f767f037 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf7700cbe7ce72070c25ce3d21e2dfeb54f by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a40add79f108cb89987724c35f82cd727e4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 98a80a40add79f108cb89987724c35f82cd727e4
PiperOrigin-RevId: 643001157
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jun 14, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b67bd9db8b3f69ba1e0d00c7881f767f037 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf7700cbe7ce72070c25ce3d21e2dfeb54f by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a40add79f108cb89987724c35f82cd727e4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 98a80a40add79f108cb89987724c35f82cd727e4
PiperOrigin-RevId: 643290582
copybara-service bot pushed a commit that referenced this pull request Jun 14, 2024
Imported from GitHub PR #13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b6 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13603 from olupton:name-devices-streams-and-threads 98a80a4
PiperOrigin-RevId: 643290582
copybara-service bot pushed a commit that referenced this pull request Jun 14, 2024
Imported from GitHub PR #13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b6 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13603 from olupton:name-devices-streams-and-threads 98a80a4
PiperOrigin-RevId: 643290582
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 14, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b67bd9db8b3f69ba1e0d00c7881f767f037 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf7700cbe7ce72070c25ce3d21e2dfeb54f by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a40add79f108cb89987724c35f82cd727e4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 98a80a40add79f108cb89987724c35f82cd727e4
PiperOrigin-RevId: 643290582
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 14, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b67bd9db8b3f69ba1e0d00c7881f767f037 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf7700cbe7ce72070c25ce3d21e2dfeb54f by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a40add79f108cb89987724c35f82cd727e4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 98a80a40add79f108cb89987724c35f82cd727e4
PiperOrigin-RevId: 643001157
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 14, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b67bd9db8b3f69ba1e0d00c7881f767f037 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf7700cbe7ce72070c25ce3d21e2dfeb54f by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a40add79f108cb89987724c35f82cd727e4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 98a80a40add79f108cb89987724c35f82cd727e4
PiperOrigin-RevId: 643001157
Copy link
Contributor

@jbaiocchi jbaiocchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this seems very useful. I'm concerned about portability and the interactions with the thread interface in Env.

xla/pjrt/gpu/se_gpu_pjrt_client.cc Show resolved Hide resolved
third_party/tsl/tsl/profiler/lib/nvtx_utils.cc Outdated Show resolved Hide resolved
xla/pjrt/local_device_state.cc Show resolved Hide resolved
xla/stream_executor/gpu/gpu_stream.cc Outdated Show resolved Hide resolved
third_party/tsl/tsl/profiler/lib/nvtx_utils.cc Outdated Show resolved Hide resolved
xla/pjrt/gpu/se_gpu_pjrt_client.cc Outdated Show resolved Hide resolved
xla/pjrt/gpu/se_gpu_pjrt_client.cc Show resolved Hide resolved
third_party/tsl/tsl/profiler/lib/nvtx_utils.h Show resolved Hide resolved
xla/stream_executor/gpu/gpu_stream.cc Outdated Show resolved Hide resolved
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jun 17, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b67bd9db8b3f69ba1e0d00c7881f767f037 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf7700cbe7ce72070c25ce3d21e2dfeb54f by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a40add79f108cb89987724c35f82cd727e4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 98a80a40add79f108cb89987724c35f82cd727e4
PiperOrigin-RevId: 643290582
copybara-service bot pushed a commit that referenced this pull request Jun 17, 2024
Imported from GitHub PR #13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b6 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13603 from olupton:name-devices-streams-and-threads 98a80a4
PiperOrigin-RevId: 643290582
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 17, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
12a02b67bd9db8b3f69ba1e0d00c7881f767f037 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
bdf8dbf7700cbe7ce72070c25ce3d21e2dfeb54f by Olli Lupton <olupton@nvidia.com>:

Add missing header

--
98a80a40add79f108cb89987724c35f82cd727e4 by Olli Lupton <olupton@nvidia.com>:

add stubs

Merging this change closes #13603

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#13603 from olupton:name-devices-streams-and-threads 98a80a40add79f108cb89987724c35f82cd727e4
PiperOrigin-RevId: 643290582
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jun 20, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
Copybara import of the project:

--
5b3121c58db8aa1b6529f0aeb8573be8bf2cde80 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

--
d973674de6218fcee88473d85bb43ba345652fdf by Olli Lupton <olupton@nvidia.com>:

Address review comments

--
918cf3e7b87150e9d666b218bbd9aca0cae606a4 by Olli Lupton <olupton@nvidia.com>:

Alternative for @jbaiocchi

--
1d1978437e64c0dac97e97ea4320a6dcb3945296 by Olli Lupton <olupton@nvidia.com>:

Address more review comments

Merging this change closes #13603

PiperOrigin-RevId: 644901234
@@ -116,6 +116,7 @@ cc_library(
"@tsl//tsl/platform:status",
"@tsl//tsl/platform:statusor",
"@tsl//tsl/profiler/lib:connected_traceme",
"@tsl//tsl/profiler/lib:nvtx_utils",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems adding this dependency unconditionally here causes a linker error if a target links in both nvtx_utils and nvtx_utils_libtpu. I could avoid this problem when I added this dependency inside if_cuda guard and guarded the include and the new code, but not sure this is the solution you would choose yourself.
We will revert this for now.

copybara-service bot pushed a commit to google/tsl that referenced this pull request Jun 20, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf0...

PiperOrigin-RevId: 644915138
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jun 20, 2024
Imported from GitHub PR openxla/xla#13603

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf0...

PiperOrigin-RevId: 644957493
Copy link

This PR was rolled back in 6cd3399!

olupton added a commit to olupton/xla that referenced this pull request Jun 24, 2024
Second attempt at openxla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
olupton added a commit to olupton/xla that referenced this pull request Jun 27, 2024
Second attempt at openxla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
olupton added a commit to olupton/xla that referenced this pull request Jul 1, 2024
Second attempt at openxla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jul 3, 2024
Imported from GitHub PR openxla/xla#14092

See openxla/xla#13603, which landed and got rolled back.
f75962e80d387f32dc9055cd1fff9029d97f0026 attempts to fix the issue described in openxla/xla#13603 (comment).
Copybara import of the project:

--
c2f947687ecc1ce8844ba7d0b258b5fd1f3b8afd by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at openxla/xla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75b2f934a1d4fe06d07519b891fbaa7f88a by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407bea90c486fd15cfffba80ba2391b1a4e5c by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f9a77c12968459770faf4bd7d0cf5db27a by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800429fbbf4033f8a4da54d4114d1bd4d228 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800429fbbf4033f8a4da54d4114d1bd4d228
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit that referenced this pull request Jul 3, 2024
Imported from GitHub PR #14092

See #13603, which landed and got rolled back.
f75962e attempts to fix the issue described in #13603 (comment).
Copybara import of the project:

--
c2f9476 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at #13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75 by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407b by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit that referenced this pull request Jul 3, 2024
Imported from GitHub PR #14092

See #13603, which landed and got rolled back.
f75962e attempts to fix the issue described in #13603 (comment).
Copybara import of the project:

--
c2f9476 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at #13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75 by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407b by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jul 3, 2024
Imported from GitHub PR openxla/xla#14092

See openxla/xla#13603, which landed and got rolled back.
f75962e80d387f32dc9055cd1fff9029d97f0026 attempts to fix the issue described in openxla/xla#13603 (comment).
Copybara import of the project:

--
c2f947687ecc1ce8844ba7d0b258b5fd1f3b8afd by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at openxla/xla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75b2f934a1d4fe06d07519b891fbaa7f88a by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407bea90c486fd15cfffba80ba2391b1a4e5c by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f9a77c12968459770faf4bd7d0cf5db27a by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800429fbbf4033f8a4da54d4114d1bd4d228 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800429fbbf4033f8a4da54d4114d1bd4d228
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jul 4, 2024
Imported from GitHub PR openxla/xla#14092

See openxla/xla#13603, which landed and got rolled back.
f75962e80d387f32dc9055cd1fff9029d97f0026 attempts to fix the issue described in openxla/xla#13603 (comment).
Copybara import of the project:

--
c2f947687ecc1ce8844ba7d0b258b5fd1f3b8afd by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at openxla/xla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75b2f934a1d4fe06d07519b891fbaa7f88a by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407bea90c486fd15cfffba80ba2391b1a4e5c by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f9a77c12968459770faf4bd7d0cf5db27a by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800429fbbf4033f8a4da54d4114d1bd4d228 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800429fbbf4033f8a4da54d4114d1bd4d228
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit that referenced this pull request Jul 4, 2024
Imported from GitHub PR #14092

See #13603, which landed and got rolled back.
f75962e attempts to fix the issue described in #13603 (comment).
Copybara import of the project:

--
c2f9476 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at #13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75 by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407b by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jul 4, 2024
Imported from GitHub PR openxla/xla#14092

See openxla/xla#13603, which landed and got rolled back.
f75962e80d387f32dc9055cd1fff9029d97f0026 attempts to fix the issue described in openxla/xla#13603 (comment).
Copybara import of the project:

--
c2f947687ecc1ce8844ba7d0b258b5fd1f3b8afd by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at openxla/xla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75b2f934a1d4fe06d07519b891fbaa7f88a by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407bea90c486fd15cfffba80ba2391b1a4e5c by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f9a77c12968459770faf4bd7d0cf5db27a by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800429fbbf4033f8a4da54d4114d1bd4d228 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800429fbbf4033f8a4da54d4114d1bd4d228
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jul 4, 2024
Imported from GitHub PR openxla/xla#14092

See openxla/xla#13603, which landed and got rolled back.
f75962e80d387f32dc9055cd1fff9029d97f0026 attempts to fix the issue described in openxla/xla#13603 (comment).
Copybara import of the project:

--
c2f947687ecc1ce8844ba7d0b258b5fd1f3b8afd by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at openxla/xla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75b2f934a1d4fe06d07519b891fbaa7f88a by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407bea90c486fd15cfffba80ba2391b1a4e5c by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f9a77c12968459770faf4bd7d0cf5db27a by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800429fbbf4033f8a4da54d4114d1bd4d228 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800429fbbf4033f8a4da54d4114d1bd4d228
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit that referenced this pull request Jul 4, 2024
Imported from GitHub PR #14092

See #13603, which landed and got rolled back.
f75962e attempts to fix the issue described in #13603 (comment).
Copybara import of the project:

--
c2f9476 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at #13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75 by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407b by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit that referenced this pull request Jul 4, 2024
Imported from GitHub PR #14092

See #13603, which landed and got rolled back.
f75962e attempts to fix the issue described in #13603 (comment).
Copybara import of the project:

--
c2f9476 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at #13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75 by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407b by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jul 4, 2024
Imported from GitHub PR openxla/xla#14092

See openxla/xla#13603, which landed and got rolled back.
f75962e80d387f32dc9055cd1fff9029d97f0026 attempts to fix the issue described in openxla/xla#13603 (comment).
Copybara import of the project:

--
c2f947687ecc1ce8844ba7d0b258b5fd1f3b8afd by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at openxla/xla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75b2f934a1d4fe06d07519b891fbaa7f88a by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407bea90c486fd15cfffba80ba2391b1a4e5c by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f9a77c12968459770faf4bd7d0cf5db27a by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800429fbbf4033f8a4da54d4114d1bd4d228 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800429fbbf4033f8a4da54d4114d1bd4d228
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit that referenced this pull request Jul 4, 2024
Imported from GitHub PR #14092

See #13603, which landed and got rolled back.
f75962e attempts to fix the issue described in #13603 (comment).
Copybara import of the project:

--
c2f9476 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at #13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75 by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407b by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

FUTURE_COPYBARA_INTEGRATE_REVIEW=#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800
PiperOrigin-RevId: 649062057
copybara-service bot pushed a commit to google/tsl that referenced this pull request Jul 4, 2024
Imported from GitHub PR openxla/xla#14092

See openxla/xla#13603, which landed and got rolled back.
f75962e80d387f32dc9055cd1fff9029d97f0026 attempts to fix the issue described in openxla/xla#13603 (comment).
Copybara import of the project:

--
c2f947687ecc1ce8844ba7d0b258b5fd1f3b8afd by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at openxla/xla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75b2f934a1d4fe06d07519b891fbaa7f88a by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407bea90c486fd15cfffba80ba2391b1a4e5c by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f9a77c12968459770faf4bd7d0cf5db27a by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800429fbbf4033f8a4da54d4114d1bd4d228 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

PiperOrigin-RevId: 649377094
copybara-service bot pushed a commit that referenced this pull request Jul 4, 2024
Imported from GitHub PR #14092

See #13603, which landed and got rolled back.
f75962e attempts to fix the issue described in #13603 (comment).
Copybara import of the project:

--
c2f9476 by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at #13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75 by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407b by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

COPYBARA_INTEGRATE_REVIEW=#14092 from olupton:name-devices-streams-and-threads-v2 7aa0800
PiperOrigin-RevId: 649377094
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jul 4, 2024
Imported from GitHub PR openxla/xla#14092

See openxla/xla#13603, which landed and got rolled back.
f75962e80d387f32dc9055cd1fff9029d97f0026 attempts to fix the issue described in openxla/xla#13603 (comment).
Copybara import of the project:

--
c2f947687ecc1ce8844ba7d0b258b5fd1f3b8afd by Olli Lupton <olupton@nvidia.com>:

NVTX: name threads, CUDA devices and CUDA streams

Second attempt at openxla/xla#13603, which was
rolled back.

This aims to improve the profiling experience. These names are shown in the Nsight Systems UI.

Device names:
![Screenshot 2024-06-10 at 14 52 37](https://github.com/openxla/xla/assets/6459623/d889d37e-ca2e-4f5e-b5bd-240bbb625b4c)

Stream names:
![Screenshot 2024-06-10 at 14 53 25](https://github.com/openxla/xla/assets/6459623/4bfc4ffa-8fdf-4b93-b23e-95bf056799f3)

Thread names:
![Screenshot 2024-06-10 at 14 54 04](https://github.com/openxla/xla/assets/6459623/8852ca9e-f2f4-4a45-8334-a18f8ab5ce18)

This also provides a missing link between replica IDs in the HLO and the physical devices in the profile.

--
ac4af75b2f934a1d4fe06d07519b891fbaa7f88a by Olli Lupton <olupton@nvidia.com>:

Work around nvtx_utils_libtpu error

--
2b3407bea90c486fd15cfffba80ba2391b1a4e5c by Olli Lupton <olupton@nvidia.com>:

Set visibility

--
a79d09f9a77c12968459770faf4bd7d0cf5db27a by Olli Lupton <olupton@nvidia.com>:

add missing ifdef

--
7aa0800429fbbf4033f8a4da54d4114d1bd4d228 by Olli Lupton <olupton@nvidia.com>:

Move device/thread naming into separate function

Merging this change closes #14092

PiperOrigin-RevId: 649377094
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants