Skip to content

Commit

Permalink
upgrade to PTL 1.7 (NVIDIA#4672)
Browse files Browse the repository at this point in the history
* upgrade to PTL 1.7

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* min version

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* replace progressbar_refresh_rate with enable progressbar, this is callback now

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* progressbar

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* replace removed PTL 1.7 args, fix cpu tests, remove p-tune older script

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* revert ssl test fixes

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* override trainer property and fix numba grad check

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* NLPDDPlugin -> NLPDDPStrategy

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* style fix

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* set max_steps default as -1

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix maxsteps in notebooks

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update trainer config

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix speech2label jenkins

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix speech2text jenkins

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* DDPPlugin -> DDPStrategy

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* remove provided strategy keys from trainer config nlp

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* check other examples

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* override LightningModule .cuda call to maintain pytorch default behavior

Signed-off-by: ericharper <complex451@gmail.com>

* revert gpt eval jenkins test

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* overwrite cuda class to PTL

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* review feedback

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* remove checkpoint callback from main config

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* patch fix for intentslot classification test

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* style fix

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: ericharper <complex451@gmail.com>
Co-authored-by: ericharper <complex451@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
  • Loading branch information
2 people authored and Hainan Xu committed Nov 29, 2022
1 parent 6519b02 commit 838674a
Show file tree
Hide file tree
Showing 126 changed files with 314 additions and 437 deletions.
4 changes: 2 additions & 2 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -597,7 +597,7 @@ pipeline {
trainer.devices=[0] \
trainer.accelerator="gpu" \
trainer.max_epochs=1 \
+trainer.max_steps=1 \
trainer.max_steps=1 \
+trainer.num_sanity_val_steps=1 \
exp_manager.exp_dir=examples/asr/speech_to_text_results'
sh 'rm -rf examples/asr/speech_to_text_results'
Expand All @@ -612,7 +612,7 @@ pipeline {
trainer.devices=[1] \
trainer.accelerator="gpu" \
trainer.max_epochs=1 \
+trainer.max_steps=1 \
trainer.max_steps=1 \
+trainer.num_sanity_val_steps=1 \
model.preprocessor._target_=nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor \
~model.preprocessor.window_size \
Expand Down
2 changes: 1 addition & 1 deletion docs/source/core/core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -399,7 +399,7 @@ configuration for a Novograd optimizer with Cosine Annealing learning rate sched
name: CosineAnnealing
# Optional arguments
max_steps: null # computed at runtime or explicitly set here
max_steps: -1 # computed at runtime or explicitly set here
monitor: val_loss
reduce_on_plateau: false
Expand Down
10 changes: 5 additions & 5 deletions docs/source/nlp/megatron.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@ the same features as other NeMo Models.
Training
^^^^^^^^

All of the necessary logic to train model parallel models in NeMo with PyTorch Lightning is contained in the ``NLPDDPPlugin``.
The ``NLPDDPPlugin`` subclasses the PyTorch Lightning training type plugin ``DDPPlugin``.
See `plugins <https://pytorch-lightning.readthedocs.io/en/latest/extensions/plugins.html>`_ for more information on PyTorch Lightning Plugins.
All of the necessary logic to train model parallel models in NeMo with PyTorch Lightning is contained in the ``NLPDDPStrategy``.
The ``NLPDDPStrategy`` subclasses the PyTorch Lightning strategy type ``DDPStrategy``.
See `strategies <https://pytorch-lightning.readthedocs.io/en/latest/extensions/strategy.html>`_ for more information on PyTorch Lightning Strategies

To enable model parallel training in NeMo:

.. code-block:: python
trainer = Trainer(plugins=[NLPDDPPlugin()], **cfg.trainer)
trainer = Trainer(strategy=NLPDDPStrategy(), **cfg.trainer)
Megatron-LM checkpoints have a specific format. One checkpoint is saved for each model parallel rank:

Expand Down Expand Up @@ -157,7 +157,7 @@ Since model parallel models always require more than one GPU, the ``Trainer`` is

.. code-block:: python
trainer = pl.Trainer(plugins=[NLPDDPPlugin()], **cfg.trainer)
trainer = pl.Trainer(strategy=NLPDDPStrategy(), **cfg.trainer)
model = TextClassificationModel.restore_from(cfg.model.nemo_path, trainer=trainer)
model.setup_test_data(test_data_config=cfg.model.test_ds)
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/asr_adapters/asr_adaptation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ trainer:
gradient_clip_val: null
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/carnelinet/carnelinet_384.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 100
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/citrinet/citrinet_1024.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 100
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/citrinet/citrinet_384.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 100
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/citrinet/citrinet_512.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 100
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/citrinet/config_bpe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 5
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 5
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
4 changes: 2 additions & 2 deletions examples/asr/conf/conformer/conformer_ctc_bpe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -165,15 +165,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 1000
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
4 changes: 2 additions & 2 deletions examples/asr/conf/conformer/conformer_ctc_char.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -140,15 +140,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 1000
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
4 changes: 2 additions & 2 deletions examples/asr/conf/conformer/conformer_transducer_bpe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -215,15 +215,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 500
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
4 changes: 2 additions & 2 deletions examples/asr/conf/conformer/conformer_transducer_char.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -210,15 +210,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 500
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,15 +166,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 1000
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -216,15 +216,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 1000
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -153,15 +153,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 1000
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 1.0
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -213,15 +213,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 1000
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/contextnet_rnnt/config_rnnt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 5
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/contextnet_rnnt/config_rnnt_bpe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 5
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/contextnet_rnnt/contextnet_rnnt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -474,7 +474,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 100
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1 # Should be set via SLURM variable `SLURM_JOB_NUM_NODES`
accelerator: gpu
strategy: ddp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 100
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1 # Should be set via SLURM variable `SLURM_JOB_NUM_NODES`
accelerator: gpu
strategy: ddp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -481,7 +481,7 @@ model:
trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
max_epochs: 100
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1 # Should be set via SLURM variable `SLURM_JOB_NUM_NODES`
accelerator: auto
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/jasper/jasper_10x5dr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 5
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
4 changes: 2 additions & 2 deletions examples/asr/conf/lstm/lstm_ctc_bpe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,15 +123,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 500
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: gpu
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.3
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
4 changes: 2 additions & 2 deletions examples/asr/conf/lstm/lstm_transducer_bpe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -186,15 +186,15 @@ trainer:
devices: -1 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 500
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.3
precision: 32 # Should be set to 16 for O1 and O2 to enable the AMP.
log_every_n_steps: 10 # Interval of logging.
progress_bar_refresh_rate: 10
enable_progress_bar: True
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/marblenet/marblenet_3x2x64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 150
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/matchboxnet/matchboxnet_3x1x64_v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 200
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/matchboxnet/matchboxnet_3x1x64_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 200
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/quartznet/quartznet_15x5.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 5
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
2 changes: 1 addition & 1 deletion examples/asr/conf/quartznet/quartznet_15x5_aug.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ model:
trainer:
devices: 1 # number of gpus
max_epochs: 5
max_steps: null # computed at runtime if not set
max_steps: -1 # computed at runtime if not set
num_nodes: 1
accelerator: gpu
strategy: ddp
Expand Down
Loading

0 comments on commit 838674a

Please sign in to comment.