upgrade to PTL 1.7 (NVIDIA#4672)

* upgrade to PTL 1.7 Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * min version Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * replace progressbar_refresh_rate with enable progressbar, this is callback now Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * progressbar Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * replace removed PTL 1.7 args, fix cpu tests, remove p-tune older script Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * revert ssl test fixes Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * override trainer property and fix numba grad check Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * NLPDDPlugin -> NLPDDPStrategy Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * style fix Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * set max_steps default as -1 Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * fix maxsteps in notebooks Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * update trainer config Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * fix speech2label jenkins Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * fix speech2text jenkins Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * DDPPlugin -> DDPStrategy Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * remove provided strategy keys from trainer config nlp Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * check other examples Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * override LightningModule .cuda call to maintain pytorch default behavior Signed-off-by: ericharper <complex451@gmail.com> * revert gpt eval jenkins test Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * overwrite cuda class to PTL Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * review feedback Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * remove checkpoint callback from main config Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * patch fix for intentslot classification test Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * style fix Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: ericharper <complex451@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv · Nov 29, 2022 · 838674a · 838674a
1 parent 6519b02
commit 838674a
Show file tree

Hide file tree

Showing 126 changed files with 314 additions and 437 deletions.
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -597,7 +597,7 @@ pipeline {
             trainer.devices=[0] \
             trainer.accelerator="gpu" \
             trainer.max_epochs=1 \
-            +trainer.max_steps=1 \
+            trainer.max_steps=1 \
             +trainer.num_sanity_val_steps=1 \
             exp_manager.exp_dir=examples/asr/speech_to_text_results'
             sh 'rm -rf examples/asr/speech_to_text_results'
@@ -612,7 +612,7 @@ pipeline {
             trainer.devices=[1] \
             trainer.accelerator="gpu" \
             trainer.max_epochs=1 \
-            +trainer.max_steps=1 \
+            trainer.max_steps=1 \
             +trainer.num_sanity_val_steps=1 \
             model.preprocessor._target_=nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor \
             ~model.preprocessor.window_size \

diff --git a/docs/source/core/core.rst b/docs/source/core/core.rst
@@ -399,7 +399,7 @@ configuration for a Novograd optimizer with Cosine Annealing learning rate sched
             name: CosineAnnealing
     
             # Optional arguments
-            max_steps: null # computed at runtime or explicitly set here
+            max_steps: -1 # computed at runtime or explicitly set here
             monitor: val_loss
             reduce_on_plateau: false
     

diff --git a/docs/source/nlp/megatron.rst b/docs/source/nlp/megatron.rst
@@ -30,15 +30,15 @@ the same features as other NeMo Models.
 Training
 ^^^^^^^^
 
-All of the necessary logic to train model parallel models in NeMo with PyTorch Lightning is contained in the ``NLPDDPPlugin``. 
-The ``NLPDDPPlugin`` subclasses the PyTorch Lightning training type plugin ``DDPPlugin``.
-See `plugins <https://pytorch-lightning.readthedocs.io/en/latest/extensions/plugins.html>`_ for more information on PyTorch Lightning Plugins.
+All of the necessary logic to train model parallel models in NeMo with PyTorch Lightning is contained in the ``NLPDDPStrategy``. 
+The ``NLPDDPStrategy`` subclasses the PyTorch Lightning strategy type ``DDPStrategy``.
+See `strategies <https://pytorch-lightning.readthedocs.io/en/latest/extensions/strategy.html>`_ for more information on PyTorch Lightning Strategies
 
 To enable model parallel training in NeMo:
 
 .. code-block:: python
 
-    trainer = Trainer(plugins=[NLPDDPPlugin()], **cfg.trainer)
+    trainer = Trainer(strategy=NLPDDPStrategy(), **cfg.trainer)
 
 Megatron-LM checkpoints have a specific format. One checkpoint is saved for each model parallel rank:
 
@@ -157,7 +157,7 @@ Since model parallel models always require more than one GPU, the ``Trainer`` is
 
 .. code-block:: python
 
-    trainer = pl.Trainer(plugins=[NLPDDPPlugin()], **cfg.trainer)
+    trainer = pl.Trainer(strategy=NLPDDPStrategy(), **cfg.trainer)
 
     model = TextClassificationModel.restore_from(cfg.model.nemo_path, trainer=trainer)
     model.setup_test_data(test_data_config=cfg.model.test_ds)

diff --git a/examples/asr/conf/asr_adapters/asr_adaptation.yaml b/examples/asr/conf/asr_adapters/asr_adaptation.yaml
@@ -164,7 +164,7 @@ trainer:
   gradient_clip_val: null
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/carnelinet/carnelinet_384.yaml b/examples/asr/conf/carnelinet/carnelinet_384.yaml
@@ -238,7 +238,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 100
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/citrinet/citrinet_1024.yaml b/examples/asr/conf/citrinet/citrinet_1024.yaml
@@ -448,7 +448,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 100
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/citrinet/citrinet_384.yaml b/examples/asr/conf/citrinet/citrinet_384.yaml
@@ -403,7 +403,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 100
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/citrinet/citrinet_512.yaml b/examples/asr/conf/citrinet/citrinet_512.yaml
@@ -402,7 +402,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 100
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/citrinet/config_bpe.yaml b/examples/asr/conf/citrinet/config_bpe.yaml
@@ -165,7 +165,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 5
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/config.yaml b/examples/asr/conf/config.yaml
@@ -168,7 +168,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 5
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/conformer/conformer_ctc_bpe.yaml b/examples/asr/conf/conformer/conformer_ctc_bpe.yaml
@@ -165,15 +165,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 1000
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: auto
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 0.0
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/conformer/conformer_ctc_char.yaml b/examples/asr/conf/conformer/conformer_ctc_char.yaml
@@ -140,15 +140,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 1000
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: auto
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 0.0
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/conformer/conformer_transducer_bpe.yaml b/examples/asr/conf/conformer/conformer_transducer_bpe.yaml
@@ -215,15 +215,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 500
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: auto
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 0.0
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/conformer/conformer_transducer_char.yaml b/examples/asr/conf/conformer/conformer_transducer_char.yaml
@@ -210,15 +210,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 500
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: auto
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 0.0
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/conformer/multilang/conformer_ctc_bpe_multilang.yaml b/examples/asr/conf/conformer/multilang/conformer_ctc_bpe_multilang.yaml
@@ -166,15 +166,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 1000
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: auto
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 0.0
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/conformer/multilang/conformer_transducer_bpe_multilang.yaml b/examples/asr/conf/conformer/multilang/conformer_transducer_bpe_multilang.yaml
@@ -216,15 +216,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 1000
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: auto
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 0.0
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/conformer/streaming/conformer_ctc_bpe_streaming.yaml b/examples/asr/conf/conformer/streaming/conformer_ctc_bpe_streaming.yaml
@@ -153,15 +153,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 1000
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: auto
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 1.0
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/conformer/streaming/conformer_transducer_bpe_streaming.yaml b/examples/asr/conf/conformer/streaming/conformer_transducer_bpe_streaming.yaml
@@ -213,15 +213,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 1000
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: auto
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 0.0
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/contextnet_rnnt/config_rnnt.yaml b/examples/asr/conf/contextnet_rnnt/config_rnnt.yaml
@@ -235,7 +235,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 5
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/contextnet_rnnt/config_rnnt_bpe.yaml b/examples/asr/conf/contextnet_rnnt/config_rnnt_bpe.yaml
@@ -235,7 +235,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 5
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/contextnet_rnnt/contextnet_rnnt.yaml b/examples/asr/conf/contextnet_rnnt/contextnet_rnnt.yaml
@@ -474,7 +474,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 100
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1  # Should be set via SLURM variable `SLURM_JOB_NUM_NODES`
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/contextnet_rnnt/contextnet_rnnt_char.yaml b/examples/asr/conf/contextnet_rnnt/contextnet_rnnt_char.yaml
@@ -476,7 +476,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 100
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1  # Should be set via SLURM variable `SLURM_JOB_NUM_NODES`
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/contextnet_rnnt/contextnet_rnnt_multilang.yaml b/examples/asr/conf/contextnet_rnnt/contextnet_rnnt_multilang.yaml
@@ -481,7 +481,7 @@ model:
 trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   max_epochs: 100
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1  # Should be set via SLURM variable `SLURM_JOB_NUM_NODES`
   accelerator: auto
   strategy: ddp

diff --git a/examples/asr/conf/jasper/jasper_10x5dr.yaml b/examples/asr/conf/jasper/jasper_10x5dr.yaml
@@ -190,7 +190,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 5
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/lstm/lstm_ctc_bpe.yaml b/examples/asr/conf/lstm/lstm_ctc_bpe.yaml
@@ -123,15 +123,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 500
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: gpu
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 0.3
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/lstm/lstm_transducer_bpe.yaml b/examples/asr/conf/lstm/lstm_transducer_bpe.yaml
@@ -186,15 +186,15 @@ trainer:
   devices: -1 # number of GPUs, -1 would use all available GPUs
   num_nodes: 1
   max_epochs: 500
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
   accelerator: auto
   strategy: ddp
   accumulate_grad_batches: 1
   gradient_clip_val: 0.3
   precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
   log_every_n_steps: 10  # Interval of logging.
-  progress_bar_refresh_rate: 10
+  enable_progress_bar: True
   resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
   num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
   check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs

diff --git a/examples/asr/conf/marblenet/marblenet_3x2x64.yaml b/examples/asr/conf/marblenet/marblenet_3x2x64.yaml
@@ -165,7 +165,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 150
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/matchboxnet/matchboxnet_3x1x64_v1.yaml b/examples/asr/conf/matchboxnet/matchboxnet_3x1x64_v1.yaml
@@ -177,7 +177,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 200
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/matchboxnet/matchboxnet_3x1x64_v2.yaml b/examples/asr/conf/matchboxnet/matchboxnet_3x1x64_v2.yaml
@@ -177,7 +177,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 200
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/quartznet/quartznet_15x5.yaml b/examples/asr/conf/quartznet/quartznet_15x5.yaml
@@ -261,7 +261,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 5
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp

diff --git a/examples/asr/conf/quartznet/quartznet_15x5_aug.yaml b/examples/asr/conf/quartznet/quartznet_15x5_aug.yaml
@@ -267,7 +267,7 @@ model:
 trainer:
   devices: 1 # number of gpus
   max_epochs: 5
-  max_steps: null # computed at runtime if not set
+  max_steps: -1 # computed at runtime if not set
   num_nodes: 1
   accelerator: gpu
   strategy: ddp