Skip to content

Commit

Permalink
docs: Clarify startuphook (#9517)
Browse files Browse the repository at this point in the history
  • Loading branch information
tara-det-ai authored Jun 17, 2024
1 parent 63a4163 commit 1630c45
Show file tree
Hide file tree
Showing 4 changed files with 79 additions and 63 deletions.
32 changes: 0 additions & 32 deletions docs/model-dev-guide/api-guides/apis-howto/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,38 +56,6 @@ Prefer to use an Example Model?
If you'd like to build off of an existing model that already runs on Determined, visit our
:ref:`example-solutions` to see if the model you'd like to train is already available.

********************
TensorFlow Support
********************

TensorFlow Core Models
======================

Determined has support for TensorFlow models that use the :ref:`Keras <api-keras-ug>` API. For
models that use the low-level TensorFlow Core APIs, we recommend wrapping your model in Keras, as
recommended by the official `TensorFlow <https://www.tensorflow.org/guide/basics#training_loops>`_
documentation.

TensorFlow 1 vs 2
=================

Determined supports both TensorFlow 1 and 2. The version of TensorFlow that is used for a particular
experiment is controlled by the container image that has been configured for that experiment.
Determined provides prebuilt Docker images that include TensorFlow 2+, 1.15, and 2.8, respectively:

- ``determinedai/tensorflow-ngc-dev:e960eae``
- ``determinedai/environments:cuda-10.2-pytorch-1.7-tf-1.15-gpu-0.21.2``
- ``determinedai/environments:cuda-11.2-tf-2.8-gpu-0.29.1``

We also provide lightweight CPU-only counterparts:

- ``determinedai/environments:py-3.8-tf-2.8-cpu-0.29.1``

To change the container image used for an experiment, specify :ref:`environment.image
<exp-environment-image>` in the experiment configuration file. Please see :ref:`container-images`
for more details about configuring training environments and a more complete list of prebuilt Docker
images.

******************
AMD ROCm Support
******************
Expand Down
4 changes: 4 additions & 0 deletions docs/model-dev-guide/prepare-container/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,14 @@ Find resources and operations for preparing your container environment.
| :ref:`custom-env` | How to set environment variables, use a startup hook, and use custom |
| | and default Docker images. |
+-------------------------------+----------------------------------------------------------------------+
| :ref:`tensorflow-support` | How to use TensorFlow Core models with Keras, support for TensorFlow |
| | 1 and 2, and how to configure container images. |
+-------------------------------+----------------------------------------------------------------------+

.. toctree::
:maxdepth: 1
:hidden:

Set Environment Images <set-environment-images>
Customize Your Environment <custom-env>
TensorFlow Support <tensorflow-support>
72 changes: 41 additions & 31 deletions docs/model-dev-guide/prepare-container/custom-env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Customize Your Environment
############################

Determined launches workloads using Docker containers. By default, workloads execute inside a
Determined launches workloads using Docker containers. By default, workloads run inside a
Determined-provided container that includes common deep learning libraries and frameworks.

If your model code has additional dependencies, the easiest way to install them is to specify a
Expand Down Expand Up @@ -34,10 +34,10 @@ format is a list of ``NAME=VALUE`` strings. For example:
- C=${B}
Variables are set sequentially, which affect variables that depend on the expansion of other
variables. In the example, names ``A``, ``B``, and ``C`` each have the value ``hello_world`` in the
variables. In the example, ``A``, ``B``, and ``C`` each have the value ``hello_world`` in the
container.

Proxy variables set in this way take precedent over variables set in the :ref:`agent configuration
Proxy variables set in this way take precedence over variables set in the :ref:`agent configuration
<agent-config-reference>`.

You can also set variables for each accelerator type, separately:
Expand All @@ -59,16 +59,25 @@ You can also set variables for each accelerator type, separately:
Startup Hooks
***************

If a ``startup-hook.sh`` file exists in the top level of your model definition directory, this file
is automatically run with every Docker container startup. This occurs before any Python interpreters
are launched or deep learning operations are performed. The startup hook can be used to customize
the container environment, install additional dependencies, and download data sets among other shell
script commands.
If a ``startup-hook.sh`` file exists in the top level of your model definition directory (for
experiments), or context directory (for shells, notebooks, and TensorBoards), it is automatically
run with every Docker container startup before any Python interpreters are launched or deep learning
operations are performed. The startup hook can customize the container environment, install
additional dependencies, and download datasets, among other shell script commands.

.. note::

``startup-hook.sh`` does not apply to ``det cmd``. It applies to experiments, notebooks, shells,
and TensorBoards, but not commands.

For shells, notebooks, and TensorBoards, make sure to supply the context directory using the
``--context`` or ``-c`` option. You can also use the ``--include`` option, though it may require
more directory management.

Startup hooks are not cached and run before the start of every workload, so expensive or
long-running operations in a startup hook can result in poor performance.

This example startup hook installs the ``wget`` utility and the ``pandas`` Python package:
Example startup hook to install the ``wget`` utility and the ``pandas`` Python package:

.. code:: bash
Expand Down Expand Up @@ -116,32 +125,30 @@ Default Images
NGC Version
===========

By default, a suitable NGC container version is used in our images. Users can select a different
By default, a suitable NGC container version is used in our images. You can select a different
version of NGC containers to build images from. Versions are listed on the `NVIDIA Frameworks site
<https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html>`__. Once a suitable
version is selected, users can rebuild these images by cloning the `MLDE environments repo
<https://github.com/determined-ai/environments>`__ and modifying either NGC_PYTORCH_VERSION or
NGC_TENSORFLOW_VERSION variables in the MakeFile, then running `make build-pytorch-ngc` or `make
build-tensorflow-ngc` respectively.
<https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html>`__. To build custom
images, cloning the `MLDE environments repo <https://github.com/determined-ai/environments>`__,
modify the ``NGC_PYTORCH_VERSION`` or ``NGC_TENSORFLOW_VERSION`` variables in the MakeFile, and run
`make build-pytorch-ngc` or `make build-tensorflow-ngc` respectively.

.. _custom-docker-images:

Custom Images
=============

While the official images contain all the dependencies needed for basic deep learning workloads,
many workloads have additional dependencies. If the extra dependencies are quick to install, you
might consider using a :ref:`startup hook <startup-hooks>`. Where installing dependencies using
``startup-hook.sh`` takes too long, it is recommended that you build your own Docker image and
publish to a Docker registry, such as `Docker Hub <https://hub.docker.com/>`__.
many workloads have additional dependencies. If the extra dependencies are quick to install, use a
:ref:`startup hook <startup-hooks>`. If installing dependencies using ``startup-hook.sh`` takes too
long, build your own Docker image and publish it to a Docker registry, such as `Docker Hub
<https://hub.docker.com/>`__.

.. warning::

Do NOT install TensorFlow, PyTorch, Horovod, or Apex packages, which conflict with
Determined-installed packages.

It is recommended that custom images use one of the official Determined images as a base image,
using the ``FROM`` instruction.
Use one of the official Determined images as a base image in the ``FROM`` instruction.

Example Dockerfile that installs custom ``conda``-, ``pip``-, and ``apt``-based dependencies:

Expand All @@ -162,8 +169,8 @@ Example Dockerfile that installs custom ``conda``-, ``pip``-, and ``apt``-based
conda activate base && \
pip install --requirement /tmp/pip_requirements.txt
Assuming that this image is published to a public repository on Docker Hub, use the following
declaration format to configure an experiment, command, or notebook:
Assuming this image is published to a public repository on Docker Hub, configure an experiment,
command, or notebook with:

.. code:: yaml
Expand All @@ -173,8 +180,7 @@ declaration format to configure an experiment, command, or notebook:
where ``my-user-name`` is your Docker Hub user, ``my-repo-name`` is the name of the Docker Hub
repository, and ``my-tag`` is the image tag to use, such as ``latest``.

If you publish your image to a private Docker Hub repository, you can specify the credentials needed
to access the repository:
For a private Docker Hub repository, specify the credentials:

.. code:: yaml
Expand All @@ -184,8 +190,7 @@ to access the repository:
username: my-user-name
password: my-password
If you publish the image to a private `Docker Registry <https://docs.docker.com/registry/>`__,
specify the registry path as part of the ``image`` field:
For a private `Docker Registry <https://docs.docker.com/registry/>`__, specify the registry path:

.. code:: yaml
Expand All @@ -195,9 +200,9 @@ specify the registry path as part of the ``image`` field:
Images are fetched using HTTPS by default. An HTTPS proxy can be configured using the
``https_proxy`` field in the :ref:`agent configuration <agent-config-reference>`.

The custom image and credentials can be set as the defaults for all tasks launched in Determined,
using the ``image`` and ``registry_auth`` fields in the :ref:`master configuration
<master-config-reference>`. Make sure to restart the master for this to take effect.
Set the custom image and credentials as the defaults for all tasks launched in Determined using the
``image`` and ``registry_auth`` fields in the :ref:`master configuration <master-config-reference>`.
Restart the master for these changes to take effect.

.. _virtual-env:

Expand Down Expand Up @@ -226,7 +231,7 @@ To ensure that a virtual environment is activated every time a new interactive t
created, in JupyterLab or using Determined Shell, update ``~/.bashrc`` with the scripts to activate
the virtual environment you want.

This example switches to a virtual environment using a :ref:`startup hook <startup-hooks>`:
Example using a :ref:`startup hook <startup-hooks>` to switch to a virtual environment:

.. code:: bash
Expand All @@ -236,3 +241,8 @@ This example switches to a virtual environment using a :ref:`startup hook <start
# Do that for every new interactive terminal session
echo 'eval "$(conda shell.bash hook)" && conda activate myenv' >> ~/.bashrc
.. note::

``startup-hook.sh`` does not apply to ``det cmd``. It applies to experiments, notebooks, shells,
and TensorBoards, but not commands.
34 changes: 34 additions & 0 deletions docs/model-dev-guide/prepare-container/tensorflow-support.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
.. _tensorflow-support:

####################
TensorFlow Support
####################

************************
TensorFlow Core Models
************************

Determined supports for TensorFlow models using the :ref:`Keras <api-keras-ug>` API. For models that
use low-level TensorFlow Core APIs, we recommend wrapping your model in Keras as suggested by the
official `TensorFlow <https://www.tensorflow.org/guide/basics#training_loops>`_ documentation.

*******************
TensorFlow 1 vs 2
*******************

Determined supports both TensorFlow 1 and 2. The version of TensorFlow used for a particular
experiment is controlled by the configured container image. Determined provides prebuilt Docker
images that include TensorFlow 2+, 1.15, and 2.8, respectively:

- ``determinedai/tensorflow-ngc-dev:e960eae``
- ``determinedai/environments:cuda-10.2-pytorch-1.7-tf-1.15-gpu-0.21.2``
- ``determinedai/environments:cuda-11.2-tf-2.8-gpu-0.29.1``

Lightweight CPU-only counterparts are also available:

- ``determinedai/environments:py-3.8-tf-2.8-cpu-0.29.1``

To change the container image used for an experiment, specify :ref:`environment.image
<exp-environment-image>` in the experiment configuration file. Please see :ref:`container-images`
for more details about configuring training environments and a more complete list of prebuilt Docker
images.

0 comments on commit 1630c45

Please sign in to comment.