Skip to content

Commit

Permalink
docs: [FE-270] add PBS known issue - Cluster tab does not display GPU…
Browse files Browse the repository at this point in the history
… information (#8719)
  • Loading branch information
jagadeesh545 authored Jan 20, 2024
1 parent 6d744f7 commit 190af1d
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/setup-cluster/slurm/slurm-known-issues.rst
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,10 @@ Some constraints are due to differences in behavior between Docker and Singulari
PBS Known Issues
******************

- If the ``Cluster`` tab in the WebUI does not display the GPU information, there may be an issue
with the PBS configuration. Visit :ref:`Ensure the ngpus resource is defined with the correct
values <pbs-ngpus-config>` section to ensure PBS is properly configured.

- Jobs are treated as successful even in the presence of a failure when PBS job history is not
enabled. Without job history enabled, the launcher is unable to obtain the exit status of jobs
and therefore they are all reported as successful. This will prevent failed jobs from
Expand Down
2 changes: 2 additions & 0 deletions docs/setup-cluster/slurm/slurm-requirements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,8 @@ interacts with PBS, we recommend the following steps:
configure ``CUDA_VISIBLE_DEVICES`` or set the ``pbs.slots_per_node`` setting in your experiment
configuration file to indicate the desired number of GPU slots for Determined.

.. _pbs-ngpus-config:

- Ensure the ``ngpus`` resource is defined with the correct values.

To ensure the successful operation of Determined, define the ``ngpus`` resource value for each
Expand Down

0 comments on commit 190af1d

Please sign in to comment.