Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add section on viewing topology #8638

Merged
merged 3 commits into from
Jan 9, 2024

Conversation

tara-det-ai
Copy link
Member

Ticket

TECHWR-369

Description

As an MLE, I would like to have a macro understanding of how the GPUs and nodes in my cluster are distributed within Determined and which slots on which GPUs are occupied, enabling me to know if my job will run and/or if there are sufficient resources for it to do so.

To visualize each node and the number of slots available and which slots are active vs used, visit the Topology section in the resource pools' details page.

@cla-bot cla-bot bot added the cla-signed label Jan 3, 2024
Copy link

netlify bot commented Jan 3, 2024

Deploy Preview for determined-ui ready!

Name Link
🔨 Latest commit 1a02296
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/659d727a3dae0400081cad57
😎 Deploy Preview https://deploy-preview-8638--determined-ui.netlify.app/
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@determined-ci determined-ci added the documentation Improvements or additions to documentation label Jan 3, 2024
@determined-ci determined-ci requested a review from a team January 3, 2024 23:52
**************************

To view a resource pool's node and GPU distribution, as well as check which GPUs are currently in
use, start by ensuring there's an active experiment running. Then, follow these steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does there have to be an active experiment running? Having that will ensure that there is some compute resource active and available but if you're in an on-prem situation or if your autoscaler hasn't scaled down your instances yet, you can still view topology whether tasks are running or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved by removing the phrase about ensuring there's an active experiment running


#. View the Topology.

Under the **compute-pool** section, select the **Active slots** hyperlink to access the topology
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to select anything, whether you're in Active slots or Queued slots or any other view within the resource pool the topology will persist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having two ways to view the topology might be confusing to users: they can select a resource pool but there is also a hyperlink. i think the hyperlink is obvious while selecting a resource pool is less obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved by avoiding mention of the hyperlink

Viewing Cluster Topology
**************************

To view a resource pool's node and GPU distribution, as well as check which GPUs are currently in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it doesn't tell you which GPUs are in use, just how many

@tara-det-ai tara-det-ai force-pushed the docs/Add-section-on-viewing-topology branch from 97ed121 to b0be950 Compare January 8, 2024 17:40
## Description

TECHWR-369

As an MLE, I would like to have a macro understanding of how the GPUs and nodes in my cluster are distributed within Determined and which slots on which GPUs are occupied, enabling me to know if my job will run and/or if there are sufficient resources for it to do so.

To visualize each node and the number of slots available and which slots are active vs used, visit the Topology section in the resource pools' details page.
there is also a hyperlink but the docs will avoid mentioning this in favor of just selecting a resource pool to view its details
@tara-det-ai tara-det-ai force-pushed the docs/Add-section-on-viewing-topology branch from 9076f28 to 1a02296 Compare January 9, 2024 16:21
@tara-det-ai tara-det-ai enabled auto-merge (squash) January 9, 2024 16:21
@tara-det-ai tara-det-ai merged commit 24cfbb8 into main Jan 9, 2024
70 of 82 checks passed
@tara-det-ai tara-det-ai deleted the docs/Add-section-on-viewing-topology branch January 9, 2024 16:29
@dannysauer dannysauer modified the milestone: 0.27.1 Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants