Skip to content

Commit

Permalink
Fix broken links (#8825)
Browse files Browse the repository at this point in the history
* Fix broken links

* reverting the change to on prem requirements

creating a separate PR for this

* Fix broken links

* zapier

* resolve feedback

* resolve issues

* resolve issues

* try to remove attributions by renaming it

* restore link to swagger REST API
  • Loading branch information
tara-det-ai authored Feb 21, 2024
1 parent bccdf0c commit dc3e41e
Show file tree
Hide file tree
Showing 45 changed files with 188 additions and 214 deletions.
2 changes: 1 addition & 1 deletion docs/get-started/architecture/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -859,5 +859,5 @@ Reference
---------

- YAML: https://learnxinyminutes.com/docs/yaml/
- Validate YAML: http://www.yamllint.com/
- Validate YAML: https://www.yamllint.com/
- Convert YAML to JSON: https://www.json2yaml.com/convert-yaml-to-json
2 changes: 1 addition & 1 deletion docs/get-started/example-solutions/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

Start with an example machine learning model converted to Determined's APIs. Code examples are in
the ``examples/`` subdirectory of the `Determined GitHub repo
<https://github.com/determined-ai/determined/tree/master/examples>`__. Download links are below.
<https://github.com/determined-ai/determined/tree/main/examples>`__. Download links are below.

For more examples, visit the `determined-examples repo
<https://github.com/determined-ai/determined-examples/>`__.
Expand Down
5 changes: 2 additions & 3 deletions docs/integrations/notification/zapier.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,8 @@ The steps to set up Zapier webhook are:
Creating a Zap with Webhook
*****************************

First, you need to create a Zap with webhook. Visit the `Zapier Website
<https://zapier.com/app/zaps>`_, signup if you haven't already, and click on the **Create Zap**
button.
First, you need to create a Zap with webhook. Visit `Zapier <https://zapier.com/>`_, signup if you
haven't already, and click on the **Create Zap** button.

Select **Webhooks by Zapier** as trigger **Catch Raw Hook** as event. Using **Catch Raw Hook**
intead of **Catch Hook** because headers are needed to verify each webhook request.
Expand Down
4 changes: 2 additions & 2 deletions docs/integrations/prometheus/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ can be enabled in the master configuration file.
Reference
***********

`Grafana <https://grafana.com/docs/grafana/latest/installation/>`__
`Grafana <https://grafana.com/docs/grafana/latest/setup-grafana/installation/>`__

`Prometheus <https://prometheus.io/docs/prometheus/latest/installation/>`__

Expand Down Expand Up @@ -156,7 +156,7 @@ source. After the Grafana server is running and the Web UI is accessible, follow
running Prometheus server address. By default, this is the machine address on port 9090.

#. After the Prometheus data source connects, import the `Determined Hardware Metrics dashboard JSON
<https://github.com/determined-ai/works-with-determined/blob/master/observability/grafana/determined-hardware-grafana.json>`__
<https://github.com/determined-ai/works-with-determined/blob/main/observability/grafana/determined-hardware-grafana.json>`__
file in **Grafana** -> **Create** -> **Import** -> **Import using panel JSON**.

*********
Expand Down
9 changes: 4 additions & 5 deletions docs/manage/elasticsearch-logging-backend.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,14 @@
##############################

Use this guide as a reference when considering a shift from the default logging backend to
`Elasticsearch <https://www.elastic.co/what-is/elasticsearch>`__ for optimized log storage and
analysis.
`Elasticsearch <https://www.elastic.co/elasticsearch>`__ for optimized log storage and analysis.

We'll discuss the limitations of the default logging backend and provide tips and guidelines for
migrating to Elasticsearch including how to tune Elasticsearch to work best with Determined.

`Elasticsearch <https://www.elastic.co/what-is/elasticsearch>`__ is a search engine commonly used
for storing application logs for search and analytics. Determined supports using Elasticsearch as
the storage backend for task logs. Configuring Determined to use Elasticsearch is simple; however,
`Elasticsearch <https://www.elastic.co/elasticsearch>`__ is a search engine commonly used for
storing application logs for search and analytics. Determined supports using Elasticsearch as the
storage backend for task logs. Configuring Determined to use Elasticsearch is simple; however,
managing an Elasticsearch cluster at scale is an involved task, so this guide is recommended for
users who have hit the limitations of the default logging backend.

Expand Down
8 changes: 4 additions & 4 deletions docs/manage/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ Make sure you back up the database and temporarily shut down the master before p

To fix this error message, locate the up migration with a suffix of ``.up.sql`` and a prefix
matching the long number in the error message in `this directory
<https://github.com/determined-ai/determined/tree/master/master/static/migrations>_` and carefully
run the SQL within the file manually against the database used by Determined. For convenience, all
the information needed to connect except the password can be found with:
<https://github.com/determined-ai/determined/tree/main/master/static/migrations>_` and carefully run
the SQL within the file manually against the database used by Determined. For convenience, all the
information needed to connect except the password can be found with:

.. code::
Expand All @@ -53,7 +53,7 @@ If this proceeds successfully, then mark the migration as successful by running
UPDATE schema_migrations SET dirty = false;
And restart the master. Otherwise, please seek assistance in the community `Slack
<https://join.slack.com/t/determined-community/shared_invite/zt-1f4hj60z5-JMHb~wSr2xksLZVBN61g_Q>`__.
<https://determined-community.slack.com/join/shared_invite/zt-1f4hj60z5-JMHb~wSr2xksLZVBN61g_Q>`__.

.. _validate-nvidia-container-toolkit:

Expand Down
10 changes: 5 additions & 5 deletions docs/model-dev-guide/api-guides/apis-howto/api-core-ug-basic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ This user guide shows you how to get started using the Core API.

Access the tutorial files via the :download:`core_api.tgz </examples/core_api.tgz>` download or
directly from the `Github repository
<https://github.com/determined-ai/determined/tree/master/examples/tutorials/core_api>`_.
<https://github.com/determined-ai/determined/tree/main/examples/tutorials/core_api>`_.

*****************
Getting Started
Expand Down Expand Up @@ -132,7 +132,7 @@ with only a few new lines of code.

The complete ``1_metrics.py`` and ``1_metrics.yaml`` listings used in this example can be found in
the :download:`core_api.tgz </examples/core_api.tgz>` download or in the `Github repository
<https://github.com/determined-ai/determined/tree/master/examples/tutorials/core_api>`_.
<https://github.com/determined-ai/determined/tree/main/examples/tutorials/core_api>`_.

.. _core-checkpoints:

Expand Down Expand Up @@ -212,7 +212,7 @@ trial ID in the checkpoint and use it to distinguish the two types of continues.

The complete ``2_checkpoints.py`` and ``2_checkpoints.yaml`` listings used in this example can be
found in the :download:`core_api.tgz </examples/core_api.tgz>` download or in the `Github repository
<https://github.com/determined-ai/determined/tree/master/examples/tutorials/core_api>`_.
<https://github.com/determined-ai/determined/tree/main/examples/tutorials/core_api>`_.

.. _core-hpsearch:

Expand Down Expand Up @@ -292,7 +292,7 @@ runs a train-validate-report loop:
The complete ``3_hpsearch.py`` and ``3_hpsearch.yaml`` listings used in this example can be found in
the :download:`core_api.tgz </examples/core_api.tgz>` download or in the `Github repository
<https://github.com/determined-ai/determined/tree/master/examples/tutorials/core_api>`_.
<https://github.com/determined-ai/determined/tree/main/examples/tutorials/core_api>`_.

.. _core-distributed:

Expand Down Expand Up @@ -420,4 +420,4 @@ considerations are:
The complete ``4_distributed.py`` and ``3_hpsearch.yaml`` listings used in this example can be found
in the :download:`core_api.tgz </examples/core_api.tgz>` download or in the `Github repository
<https://github.com/determined-ai/determined/tree/master/examples/tutorials/core_api>`_.
<https://github.com/determined-ai/determined/tree/main/examples/tutorials/core_api>`_.
4 changes: 2 additions & 2 deletions docs/model-dev-guide/api-guides/apis-howto/api-core-ug.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Create a new directory.

Access the tutorial files via the :download:`core_api_pytorch_mnist.tgz
</examples/core_api_pytorch_mnist.tgz>` download link or directly from the `Github repository
<https://github.com/determined-ai/determined/tree/master/examples/tutorials/core_api_pytorch_mnist>`_.
<https://github.com/determined-ai/determined/tree/main/examples/tutorials/core_api_pytorch_mnist>`_.
These scripts have already been modified to fit the steps outlined in this tutorial.

In this initial step, we’ll run our experiment using the ``model_def.py`` script and its
Expand Down Expand Up @@ -521,7 +521,7 @@ skipping batch 1, warming up on batch 2, profiling batches 3 and 4, then repeati
files will be uploaded to the experiment's TensorBoard path and can be viewed under the "PyTorch
Profiler" tab in the Determined Tensorboard UI.

See `PyTorch Profiler <https://github.com/pytorch/kineto/tree/master/tb_plugin>`_ documentation for
See `PyTorch Profiler <https://github.com/pytorch/kineto/tree/main/tb_plugin>`_ documentation for
details.

.. code:: python
Expand Down
14 changes: 7 additions & 7 deletions docs/model-dev-guide/api-guides/apis-howto/api-pytorch-ug.rst
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ finding the common code snippet: ``for batch in dataloader``. In Determined,
:meth:`~determined.pytorch.PyTorchTrial.train_batch` also works with one batch at a time.

Take `this script implemented with the native PyTorch
<https://github.com/pytorch/examples/blob/master/imagenet/main.py>`_ as an example. It has the
<https://github.com/pytorch/examples/blob/main/imagenet/main.py>`_ as an example. It has the
following code for the training loop.

.. code:: python
Expand Down Expand Up @@ -410,8 +410,8 @@ training") is easy if you follow a few rules.
- Even if you are going to ultimately return an IterableDataset, it is best to use PyTorch's
Sampler class as the basis for choosing the order of records. Operations on Samplers are quick
and cheap, while operations on data afterwards are expensive. For more details, see the
discussion of random vs sequential access `here <https://yogadl.readthedocs.io>`_. If you don't
have a custom sampler, start with a simple one:
discussion of random vs sequential access `here <https://yogadl.readthedocs.io/en/latest/>`_. If
you don't have a custom sampler, start with a simple one:

..
code::python
Expand Down Expand Up @@ -568,8 +568,8 @@ Remove Pinned GPUs
Determined handles scheduling jobs on available slots. However, you need to let the Determined
library handles choosing the GPUs.

Take `this script <https://github.com/pytorch/examples/blob/master/imagenet/main.py>`_ as an
example. It has the following code to configure the GPU:
Take `this script <https://github.com/pytorch/examples/blob/main/imagenet/main.py>`_ as an example.
It has the following code to configure the GPU:

.. code:: python
Expand All @@ -585,8 +585,8 @@ To run distributed training outside Determined, you need to have code that handl
launching processes, moving models to pined GPUs, sharding data, and reducing metrics. You need to
remove this code to be not conflict with the Determined library.

Take `this script <https://github.com/pytorch/examples/blob/master/imagenet/main.py>`_ as an
example. It has the following code to initialize the process group:
Take `this script <https://github.com/pytorch/examples/blob/main/imagenet/main.py>`_ as an example.
It has the following code to initialize the process group:

.. code:: python
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ In this guide, you'll learn how to use the DeepSpeed API.
| :ref:`deepspeed-reference` |
+-----------------------------------------------------------------------+

`DeepSpeed <https://deepspeed.ai/>`_ is a Microsoft library that supports large-scale, distributed
learning with sharded optimizer state training and pipeline parallelism. Determined supports
DeepSpeed with the :class:`~determined.pytorch.deepspeed.DeepSpeedTrial` API.
`DeepSpeed <https://www.deepspeed.ai/>`_ is a Microsoft library that supports large-scale,
distributed learning with sharded optimizer state training and pipeline parallelism. Determined
supports DeepSpeed with the :class:`~determined.pytorch.deepspeed.DeepSpeedTrial` API.
:class:`~determined.pytorch.deepspeed.DeepSpeedTrial` provides a way to use an automated training
loop with DeepSpeed.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ engine passed to :meth:`~determined.pytorch.deepspeed.DeepSpeedTrialContext.wrap

For more advanced cases where model engines have different model parallel topologies, contact
support on the Determined `community Slack
<https://join.slack.com/t/determined-community/shared_invite/zt-1f4hj60z5-JMHb~wSr2xksLZVBN61g_Q>`_.
<https://determined-community.slack.com/join/shared_invite/zt-1f4hj60z5-JMHb~wSr2xksLZVBN61g_Q>`_.

*****************
Custom Reducers
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ instance requires no further changes to your code.

For a complete example of how to use DeepSpeed Autotune with ``DeepSpeedTrial``, visit the
`Determined GitHub Repo
<https://github.com/determined-ai/determined/tree/master/examples/deepspeed_autotune/torchvision/deepspeed_trial>`__
<https://github.com/determined-ai/determined/tree/main/examples/deepspeed_autotune/torchvision/deepspeed_trial>`__
and navigate to ``examples/deepspeed_autotune/torchvision/deepspeed_trial`` .

.. note::
Expand Down Expand Up @@ -164,7 +164,7 @@ so there is no need to remove the context manager after the ``dsat`` trials have

For a complete example of how to use DeepSpeed Autotune with Core API, visit the `Determined GitHub
Repo
<https://github.com/determined-ai/determined/tree/master/examples/deepspeed_autotune/torchvision/core_api>`__
<https://github.com/determined-ai/determined/tree/main/examples/deepspeed_autotune/torchvision/core_api>`__
and navigate to ``examples/deepspeed_autotune/torchvision/core_api`` .

Hugging Face Trainer
Expand Down Expand Up @@ -215,8 +215,8 @@ relevant code:
``dsat_reporting_context`` context manager.

To find examples that use DeepSpeed Autotune with Hugging Face Trainer, visit the `Determined GitHub
Repo <https://github.com/determined-ai/determined/tree/master/examples/hf_trainer_api>`__ and
navigate to ``examples/hf_trainer_api``.
Repo <https://github.com/determined-ai/determined/tree/main/examples/hf_trainer_api>`__ and navigate
to ``examples/hf_trainer_api``.

******************
Advanced Options
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ DeepSpeed training initialization consists of two steps:
#. Create the DeepSpeed model engine.

Refer to the `DeepSpeed Getting Started guide
<https://www.deepspeed.ai/getting-started/#writing-deepspeed-models/>`_ for more information.
<https://www.deepspeed.ai/getting-started/#writing-deepspeed-models>`_ for more information.

Outside of Determined, this is typically done in the following way:

Expand Down Expand Up @@ -318,7 +318,7 @@ method.
passed directly into ``torch.profiler.profile``. Stepping the profiler will be handled automatically
during the training loop.

See the `PyTorch profiler plugin <https://github.com/pytorch/kineto/tree/master/tb_plugin>`_ for
See the `PyTorch profiler plugin <https://github.com/pytorch/kineto/tree/main/tb_plugin>`_ for
details.

The snippet below will profile GPU and CPU usage, skipping batch 1, warming up on batch 2, and
Expand Down
4 changes: 0 additions & 4 deletions docs/model-dev-guide/api-guides/batch-process-api-ug.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,10 +169,6 @@ You have the option to associate your batch inference run with the
:class:~determined.experimental.model.ModelVersion employed during the run. This allows you to
compile custom metrics for that specific object, which can then be analyzed at a later stage.

The ``inference_example.py`` file in the `CIFAR10 Pytorch Example
<https://github.com/determined-ai/determined/tree/main/examples/computer_vision/cifar10_pytorch>`__
is an example.

Connect the :class:`~determined.experimental.checkpoint.Checkpoint` or
:class:`~determined.experimental.model.ModelVersion` to the inference run.

Expand Down
6 changes: 3 additions & 3 deletions docs/model-dev-guide/debug-models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,9 @@ This step assumes you have a working local environment for training. If you do n
If your per-method checks in :ref:`Step 2 <step2>` passed but local test mode fails, your
``Trial`` subclass might not be implemented correctly. Double-check the documentation. It is also
possible that you have found a bug or an invalid assumption in the Determined software and should
`file a GitHub issue <https://github.com/determined-ai/determined/issues/new>`__ or contact
`file a GitHub issue <https://github.com/determined-ai/determined/issues>`__ or contact
Determined on `Slack
<https://join.slack.com/t/determined-community/shared_invite/zt-1f4hj60z5-JMHb~wSr2xksLZVBN61g_Q>`__.
<https://determined-community.slack.com/join/shared_invite/zt-1f4hj60z5-JMHb~wSr2xksLZVBN61g_Q>`__.

.. _step4:

Expand Down Expand Up @@ -298,7 +298,7 @@ interactive environment, it is submitted to the cluster and managed by Determine
has errors. Review the :ref:`experiment configuration <experiment-config-reference>`.

If you are unable to identify the cause of the problem, contact Determined `community support
<https://join.slack.com/t/determined-community/shared_invite/zt-1f4hj60z5-JMHb~wSr2xksLZVBN61g_Q>`__!
<https://determined-community.slack.com/join/shared_invite/zt-1f4hj60z5-JMHb~wSr2xksLZVBN61g_Q>`__!

.. _step8:

Expand Down
20 changes: 0 additions & 20 deletions docs/model-dev-guide/dtrain/dtrain-implement.rst
Original file line number Diff line number Diff line change
Expand Up @@ -226,23 +226,3 @@ important details regarding ``slots_per_trial`` and the scheduler's behavior:
``slots_per_trial`` is set so that it can be scheduled within these constraints. You can also use
the CLI command ``det task list`` to check if any other tasks are using GPUs and preventing your
experiment from using all the GPUs on a machine.

***********************
Distributed Inference
***********************

PyTorch users have the option to use the existing distributed training workflow with PyTorchTrial to
accelerate their inference workloads. This workflow is not yet officially supported, therefore,
users must specify certain training-specific artifacts that are not used for inference. To run a
distributed batch inference job, create a new PyTorchTrial and follow these steps:

- Load the trained model and build the inference dataset using ``build_validation_data_loader()``.
- Specify the inference step using ``evaluate_batch()`` or ``evaluate_full_dataset()``.
- Register a dummy ``optimizer``.
- Specify a ``build_training_data_loader()`` that returns a dummy dataloader.
- Specify a no-op ``train_batch()`` that returns an empty map of metrics.

Once the new PyTorchTrial object is created, use the experiment configuration to distribute
inference in the same way as training. `cifar10_pytorch_inference
<https://github.com/determined-ai/determined/blob/master/examples/computer_vision/cifar10_pytorch_inference/>`_
serves as an example of distributed batch inference.
9 changes: 5 additions & 4 deletions docs/model-dev-guide/dtrain/dtrain-introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@
How Determined Distributed Training Works
*******************************************

Determined employs data parallelism in its approach to distributed training. Data parallelism for
deep learning consists of a set of workers, where each worker is assigned to a unique compute
accelerator such as a GPU or a TPU. Each worker maintains a copy of the model parameters (weights
that are being trained), which is synchronized across all the workers at the start of training.
Determined employs data or model parallelism in its approach to distributed training. Data
parallelism for deep learning consists of a set of workers, where each worker is assigned to a
unique compute accelerator such as a GPU or a TPU. Each worker maintains a copy of the model
parameters (weights that are being trained), which is synchronized across all the workers at the
start of training.

.. image:: /assets/images/_dtrain-loop-dark.png
:class: only-dark
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ object in the Trial base class. This :class:`~determined.TrialContext` object ex
:func:`~determined.TrialContext.get_hparam` method that takes the hyperparameter name. For example,
to inject the value of the ``dropout_probability`` hyperparameter defined in the experiment
configuration into the constructor of a PyTorch `Dropout
<https://pytorch.org/docs/stable/nn.html#dropout>`_ layer:
<https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html>`_ layer:

.. code:: python
Expand Down
Loading

0 comments on commit dc3e41e

Please sign in to comment.