Skip to content

Commit

Permalink
chore: update AMIs - Nvidia minor version bump (#8945)
Browse files Browse the repository at this point in the history
  • Loading branch information
hamidzr authored Mar 5, 2024
1 parent e108ed7 commit 6ecd81e
Show file tree
Hide file tree
Showing 33 changed files with 229 additions and 221 deletions.
14 changes: 7 additions & 7 deletions .circleci/real_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ parameters:
# be referenced by --ee testing.
default-pt-gpu-image:
type: string
default: determinedai/environments:cuda-11.3-pytorch-1.12-gpu-f66cbce
default: determinedai/environments:cuda-11.3-pytorch-1.12-gpu-2196775
# Some python, go, and react dependencies are cached by circleci via `save_cache`/`restore_cache`.
# If the dependencies stay the same, but the circleci code that would produce them is changed,
# it may be necessary to invalidate the cache by incrementing this value.
Expand Down Expand Up @@ -223,7 +223,7 @@ commands:
- when:
condition: <<parameters.tf2>>
steps:
- run: docker pull determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-f66cbce
- run: docker pull determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-2196775

login-docker:
parameters:
Expand Down Expand Up @@ -1849,7 +1849,7 @@ jobs:

test-unit-harness-gpu:
docker:
- image: determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1
- image: determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1
resource_class: determined-ai/container-runner-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand All @@ -1870,7 +1870,7 @@ jobs:

test-unit-harness-pytorch2-gpu:
docker:
- image: determinedai/environments:cuda-11.8-pytorch-2.0-gpu-f66cbce
- image: determinedai/environments:cuda-11.8-pytorch-2.0-gpu-2196775
resource_class: determined-ai/container-runner-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand All @@ -1891,7 +1891,7 @@ jobs:

test-unit-harness-pytorch2-cpu:
docker:
- image: determinedai/environments:py-3.10-pytorch-2.0-cpu-f66cbce
- image: determinedai/environments:py-3.10-pytorch-2.0-cpu-2196775
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
- checkout
Expand All @@ -1912,7 +1912,7 @@ jobs:

test-unit-harness-gpu-parallel:
docker:
- image: determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1
- image: determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1
resource_class: determined-ai/container-runner-multi-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand Down Expand Up @@ -2498,7 +2498,7 @@ jobs:
type: string
default: "1"
environment-image:
default: determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1
default: determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1
type: string
accel-node-taints:
type: string
Expand Down
8 changes: 4 additions & 4 deletions docs/model-dev-guide/api-guides/apis-howto/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,14 @@ experiment is controlled by the container image that has been configured for tha
Determined provides prebuilt Docker images that include TensorFlow 2.11, 1.15, and 2.8,
respectively:

- ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1`` (default)
- ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1`` (default)
- ``determinedai/environments:cuda-10.2-pytorch-1.7-tf-1.15-gpu-0.21.2``
- ``determinedai/environments:cuda-11.2-tf-2.8-gpu-0.27.1``
- ``determinedai/environments:cuda-11.2-tf-2.8-gpu-0.29.1``

We also provide lightweight CPU-only counterparts:

- ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.27.1``
- ``determinedai/environments:py-3.8-tf-2.8-cpu-0.27.1``
- ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.29.1``
- ``determinedai/environments:py-3.8-tf-2.8-cpu-0.29.1``

To change the container image used for an experiment, specify :ref:`environment.image
<exp-environment-image>` in the experiment configuration file. Please see :ref:`container-images`
Expand Down
8 changes: 4 additions & 4 deletions docs/model-dev-guide/prepare-container/custom-env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,9 @@ Default Images
+-------------+-------------------------------------------------------------------------------+
| Environment | File Name |
+=============+===============================================================================+
| CPUs | ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.27.1`` |
| CPUs | ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.29.1`` |
+-------------+-------------------------------------------------------------------------------+
| NVIDIA GPUs | ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1`` |
| NVIDIA GPUs | ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1`` |
+-------------+-------------------------------------------------------------------------------+
| AMD GPUs | ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` |
+-------------+-------------------------------------------------------------------------------+
Expand Down Expand Up @@ -132,7 +132,7 @@ Example Dockerfile that installs custom ``conda``-, ``pip``-, and ``apt``-based
.. code:: bash
# Determined Image
FROM determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1
FROM determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1
# Custom Configuration
RUN apt-get update && \
Expand Down Expand Up @@ -195,7 +195,7 @@ environments using :ref:`custom images <custom-docker-images>`:
.. code:: bash
# Determined Image
FROM determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.27.1
FROM determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.29.1
# Create a virtual environment
RUN conda create -n myenv python=3.8
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/deploy/helm-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,11 +194,11 @@

- ``cpuImage``: Sets the default Docker image for all non-GPU tasks. If a Docker image is
specified in the :ref:`experiment config <exp-environment-image>` this default is overriden.
Defaults to: ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.27.1``.
Defaults to: ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.29.1``.

- ``gpuImage``: Sets the default Docker image for all GPU tasks. If a Docker image is specified
in the :ref:`experiment config <exp-environment-image>` this default is overriden. Defaults
to: ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1``.
to: ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1``.

- ``logPolicies``: Sets log policies for trials. For details, visit :ref:`log_policies
<experiment-config-min-validation-period>`.
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/deploy/master-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,9 @@ configure different container images for NVIDIA GPU tasks using the ``cuda`` key
Determined 0.17.6), CPU tasks using ``cpu`` key, and ROCm (AMD GPU) tasks using the ``rocm`` key.
Default values:

- ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1`` for NVIDIA GPUs.
- ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1`` for NVIDIA GPUs.
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` for ROCm.
- ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.27.1`` for CPUs.
- ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.29.1`` for CPUs.

``environment_variables``
=========================
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/experiment-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1299,8 +1299,8 @@ Optional. The Docker image to use when executing the workload. This image must b
container images for NVIDIA GPU tasks using ``cuda`` key (``gpu`` prior to 0.17.6), CPU tasks using
``cpu`` key, and ROCm (AMD GPU) tasks using ``rocm`` key. Default values:

- ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1`` for NVIDIA GPUs.
- ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.27.1`` for CPUs.
- ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1`` for NVIDIA GPUs.
- ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.29.1`` for CPUs.
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` for ROCm.

When the cluster is configured with :ref:`resource_manager.type: slurm
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/job-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ The following configuration settings are supported:
different container images for NVIDIA GPU tasks using ``cuda`` key (``gpu`` prior to 0.17.6),
CPU tasks using ``cpu`` key, and ROCm (AMD GPU) tasks using ``rocm`` key. Default values:

- ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1`` for NVIDIA GPUs.
- ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1`` for NVIDIA GPUs.
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` for ROCm.
- ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.27.1`` for CPUs.
- ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.29.1`` for CPUs.

- ``force_pull_image``: Forcibly pull the image from the Docker registry and bypass the Docker
cache. Defaults to ``false``.
Expand Down
4 changes: 2 additions & 2 deletions docs/setup-cluster/deploy-cluster/slurm/singularity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ by default in this version of Determined are described below.
+-------------+--------------------------------------------------------------------------+
| Environment | File Name |
+=============+==========================================================================+
| CPUs | ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-f66cbce`` |
| CPUs | ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-2196775`` |
+-------------+--------------------------------------------------------------------------+
| NVIDIA GPUs | ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-f66cbce`` |
| NVIDIA GPUs | ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-2196775`` |
+-------------+--------------------------------------------------------------------------+
| AMD GPUs | ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512`` |
+-------------+--------------------------------------------------------------------------+
Expand Down
4 changes: 2 additions & 2 deletions docs/setup-cluster/gcp/install-gcp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -406,5 +406,5 @@ This command line will spin up a cluster of up to 2 A100s in the ``us-central1-c
--compute-agent-instance-type a2-highgpu-1g --gpu-num 1 \
--gpu-type nvidia-tesla-a100 \
--region us-central1 --zone us-central1-c \
--gpu-env-image determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.27.1 \
--cpu-env-image determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.27.1
--gpu-env-image determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-0.29.1 \
--cpu-env-image determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-0.29.1
4 changes: 2 additions & 2 deletions docs/setup-cluster/slurm/singularity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ by default in this version of Determined are described below.
+-------------+--------------------------------------------------------------------------+
| Environment | File Name |
+=============+==========================================================================+
| CPUs | ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-f66cbce`` |
| CPUs | ``determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-2196775`` |
+-------------+--------------------------------------------------------------------------+
| NVIDIA GPUs | ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-f66cbce`` |
| NVIDIA GPUs | ``determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-2196775`` |
+-------------+--------------------------------------------------------------------------+
| AMD GPUs | ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512`` |
+-------------+--------------------------------------------------------------------------+
Expand Down
2 changes: 1 addition & 1 deletion docs/setup-cluster/slurm/slurm-requirements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,7 @@ platform. There may be additional per-user configuration that is required.

.. code:: bash
image=determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-f66cbce
image=determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-2196775
cd /shared/enroot/images
enroot import docker://$image
enroot create /shared/enroot/images/${image//[\/:]/\+}.sqsh
Expand Down
12 changes: 6 additions & 6 deletions e2e_tests/tests/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@
MAX_TRIAL_BUILD_SECS = 90


DEFAULT_TF2_CPU_IMAGE = "determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-f66cbce"
DEFAULT_TF2_GPU_IMAGE = "determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-f66cbce"
DEFAULT_PT_CPU_IMAGE = "determinedai/environments:py-3.9-pytorch-1.12-cpu-f66cbce"
DEFAULT_PT_GPU_IMAGE = "determinedai/environments:cuda-11.3-pytorch-1.12-gpu-f66cbce"
DEFAULT_PT2_CPU_IMAGE = "determinedai/environments:py-3.10-pytorch-2.0-cpu-f66cbce"
DEFAULT_PT2_GPU_IMAGE = "determinedai/environments:cuda-11.8-pytorch-2.0-gpu-f66cbce"
DEFAULT_TF2_CPU_IMAGE = "determinedai/environments:py-3.9-pytorch-1.12-tf-2.11-cpu-2196775"
DEFAULT_TF2_GPU_IMAGE = "determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-2196775"
DEFAULT_PT_CPU_IMAGE = "determinedai/environments:py-3.9-pytorch-1.12-cpu-2196775"
DEFAULT_PT_GPU_IMAGE = "determinedai/environments:cuda-11.3-pytorch-1.12-gpu-2196775"
DEFAULT_PT2_CPU_IMAGE = "determinedai/environments:py-3.10-pytorch-2.0-cpu-2196775"
DEFAULT_PT2_GPU_IMAGE = "determinedai/environments:cuda-11.8-pytorch-2.0-gpu-2196775"

TF2_CPU_IMAGE = os.environ.get("TF2_CPU_IMAGE") or DEFAULT_TF2_CPU_IMAGE
TF2_GPU_IMAGE = os.environ.get("TF2_GPU_IMAGE") or DEFAULT_TF2_GPU_IMAGE
Expand Down
20 changes: 10 additions & 10 deletions harness/determined/deploy/aws/templates/efs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,35 @@ Mappings:
RegionMap:
ap-northeast-1:
Master: ami-00910ef9457f0df47
Agent: ami-0bffb351742e4d778
Agent: ami-0a06577b7707632f8
# TODO(DET-4258) Uncomment these when we fully support all P3 regions.
# ap-northeast-2:
# Master: ami-035e3e44dc41db6a2
# Agent: ami-0429ee62fc06a8019
# Agent: ami-00d8faa0b84ddbd88
# ap-southeast-1:
# Master: ami-0fd1ee6c8b656f020
# Agent: ami-0aa9f5e7e1a9933f2
# Agent: ami-0435fcad25753a1a6
# ap-southeast-2:
# Master: ami-0b62ecd3babd1c548
# Agent: ami-01ae41ecfbb042eee
# Agent: ami-0067e868a8d4f8a1e
eu-central-1:
Master: ami-0abbe417ed83c0b29
Agent: ami-02a2a8e7cb59b8ca6
Agent: ami-0ec870b46499de494
eu-west-1:
Master: ami-0e3f7dd2dc743e48a
Agent: ami-04d4dd517927955bb
Agent: ami-013c92fb90c7a7971
# eu-west-2:
# Master: ami-0d78429fb6af30994
# Agent: ami-00f4c4dc577d844bd
# Agent: ami-0a5b2c87970d66d5b
us-east-1:
Master: ami-0172070f66a8ebe63
Agent: ami-0cd3eb2e7394e6135
Agent: ami-061ff7cfe905dfbd5
us-east-2:
Master: ami-0bafa3699418551cd
Agent: ami-092cd38207aded760
Agent: ami-072947196f5aaa3a5
us-west-2:
Master: ami-0ceeab680f529cc36
Agent: ami-0ab46b9ac642fc297
Agent: ami-0d9494a6bae62f03e

Parameters:
VpcCIDR:
Expand Down
20 changes: 10 additions & 10 deletions harness/determined/deploy/aws/templates/fsx.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,35 @@ Mappings:
RegionMap:
ap-northeast-1:
Master: ami-00910ef9457f0df47
Agent: ami-0bffb351742e4d778
Agent: ami-0a06577b7707632f8
# TODO(DET-4258) Uncomment these when we fully support all P3 regions.
# ap-northeast-2:
# Master: ami-035e3e44dc41db6a2
# Agent: ami-0429ee62fc06a8019
# Agent: ami-00d8faa0b84ddbd88
# ap-southeast-1:
# Master: ami-0fd1ee6c8b656f020
# Agent: ami-0aa9f5e7e1a9933f2
# Agent: ami-0435fcad25753a1a6
# ap-southeast-2:
# Master: ami-0b62ecd3babd1c548
# Agent: ami-01ae41ecfbb042eee
# Agent: ami-0067e868a8d4f8a1e
eu-central-1:
Master: ami-0abbe417ed83c0b29
Agent: ami-02a2a8e7cb59b8ca6
Agent: ami-0ec870b46499de494
eu-west-1:
Master: ami-0e3f7dd2dc743e48a
Agent: ami-04d4dd517927955bb
Agent: ami-013c92fb90c7a7971
# eu-west-2:
# Master: ami-0d78429fb6af30994
# Agent: ami-00f4c4dc577d844bd
# Agent: ami-0a5b2c87970d66d5b
us-east-1:
Master: ami-0172070f66a8ebe63
Agent: ami-0cd3eb2e7394e6135
Agent: ami-061ff7cfe905dfbd5
us-east-2:
Master: ami-0bafa3699418551cd
Agent: ami-092cd38207aded760
Agent: ami-072947196f5aaa3a5
us-west-2:
Master: ami-0ceeab680f529cc36
Agent: ami-0ab46b9ac642fc297
Agent: ami-0d9494a6bae62f03e

Parameters:
VpcCIDR:
Expand Down
4 changes: 2 additions & 2 deletions harness/determined/deploy/aws/templates/govcloud.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ Mappings:
RegionMap:
us-gov-east-1:
Master: ami-04ef693ebcf519dc3
Agent: ami-065d670b7d0d648bf
Agent: ami-0b20a31b3607df583
us-gov-west-1:
Master: ami-08bd15d820a3c087e
Agent: ami-058ffd48428aaeaeb
Agent: ami-09e52c95db3076b2a
Parameters:
Keypair:
Type: AWS::EC2::KeyPair::KeyName
Expand Down
20 changes: 10 additions & 10 deletions harness/determined/deploy/aws/templates/lore.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,35 @@ Mappings:
RegionMap:
ap-northeast-1:
Master: ami-00910ef9457f0df47
Agent: ami-0bffb351742e4d778
Agent: ami-0a06577b7707632f8
# TODO(DET-4258) Uncomment these when we fully support all P3 regions.
# ap-northeast-2:
# Master: ami-035e3e44dc41db6a2
# Agent: ami-0429ee62fc06a8019
# Agent: ami-00d8faa0b84ddbd88
# ap-southeast-1:
# Master: ami-0fd1ee6c8b656f020
# Agent: ami-0aa9f5e7e1a9933f2
# Agent: ami-0435fcad25753a1a6
# ap-southeast-2:
# Master: ami-0b62ecd3babd1c548
# Agent: ami-01ae41ecfbb042eee
# Agent: ami-0067e868a8d4f8a1e
eu-central-1:
Master: ami-0abbe417ed83c0b29
Agent: ami-02a2a8e7cb59b8ca6
Agent: ami-0ec870b46499de494
eu-west-1:
Master: ami-0e3f7dd2dc743e48a
Agent: ami-04d4dd517927955bb
Agent: ami-013c92fb90c7a7971
# eu-west-2:
# Master: ami-0d78429fb6af30994
# Agent: ami-00f4c4dc577d844bd
# Agent: ami-0a5b2c87970d66d5b
us-east-1:
Master: ami-0172070f66a8ebe63
Agent: ami-0cd3eb2e7394e6135
Agent: ami-061ff7cfe905dfbd5
us-east-2:
Master: ami-0bafa3699418551cd
Agent: ami-092cd38207aded760
Agent: ami-072947196f5aaa3a5
us-west-2:
Master: ami-0ceeab680f529cc36
Agent: ami-0ab46b9ac642fc297
Agent: ami-0d9494a6bae62f03e

Parameters:
VpcCIDR:
Expand Down
Loading

0 comments on commit 6ecd81e

Please sign in to comment.