Skip to content

Commit

Permalink
Added docs on LR & DDP (#1414)
Browse files Browse the repository at this point in the history
Co-authored-by: Louis-Dupont <35190946+Louis-Dupont@users.noreply.github.com>
  • Loading branch information
BloodAxe and Louis-Dupont committed Aug 28, 2023
1 parent fd87ce0 commit b85e5c4
Showing 1 changed file with 20 additions and 0 deletions.
20 changes: 20 additions & 0 deletions documentation/source/device.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,26 @@ class DDPTop1Accuracy(torchmetrics.Metric):
5. The `compute()` method then calculates the metric value according to your implementation. In this example, every process will return the same result: `0.6` (180 correct predictions out of 300 total predictions).
6. Finally, calling `reset()` will reset the internal state of the metric, making it ready to accumulate new data at the start of the next epoch.

### C. When using DDP you may want to scale the learning rate

Using N GPUs in DDP mode, has an effect of increasing batch size by a factor of N.
And it has been [shown](https://arxiv.org/abs/1706.02677) that it may be necessary to scale the learning rate accordingly.
The rule of thumb is that if batch size is increased by a factor of N (Or N nodes used in DDP), the learning rate should be
also increased by a factor of N.

However, when it comes to adaptive optimizers like Adam, the situation is a bit different.
Adaptive optimizers like Adam automatically adjust the learning rate for each parameter based on the historical
gradient information.
They inherently adapt to the scale of the gradients and don't require manual adjustments of the learning rate
in the same way as fixed learning rate methods like SGD.

That being said, we still recommend to try out different learning rates to see the impact on the final metrics.
You can run experiments manually or use Hydra sweep syntax to run experiments with custom learning rates as follows:

```bash
python -m super_gradients.train_from_recipe -m --config-name=coco2017_yolo_nas_s training_hyperparams.initial_lr=1e-3,5e-3,1e-4
```

---

## How to set training mode with recipes ?
Expand Down

0 comments on commit b85e5c4

Please sign in to comment.