Skip to content

Commit

Permalink
simplify examples structure (#1247)
Browse files Browse the repository at this point in the history
* simplify examples structure

* update changelog

* fix imports

* rename example

* rename scripts

* changelog
  • Loading branch information
Borda committed Apr 3, 2020
1 parent 16f4cc9 commit 22bedf9
Show file tree
Hide file tree
Showing 20 changed files with 138 additions and 84 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

- Added parity test between a vanilla MNIST model and lightning model ([#1284](https://github.com/PyTorchLightning/pytorch-lightning/pull/1284))
- Added parity test between a vanilla MNIST model and lightning model ([#1284](https://github.com/PyTorchLightning/pytorch-lightning/pull/1284))
- Added parity test between a vanilla RNN model and lightning model ([#1351](https://github.com/PyTorchLightning/pytorch-lightning/pull/1351))
- Added Reinforcement Learning - Deep Q-network (DQN) lightning example ([#1232](https://github.com/PyTorchLightning/pytorch-lightning/pull/1232))
- Added support for hierarchical `dict` ([#1152](https://github.com/PyTorchLightning/pytorch-lightning/pull/1152))
- Added `TrainsLogger` class ([#1122](https://github.com/PyTorchLightning/pytorch-lightning/pull/1122))
Expand Down Expand Up @@ -40,6 +41,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Give warnings for unimplemented required lightning methods ([#1317](https://github.com/PyTorchLightning/pytorch-lightning/pull/1317))
- Enhanced load_from_checkpoint to also forward params to the model ([#1307](https://github.com/PyTorchLightning/pytorch-lightning/pull/1307))
- Made `evaluate` method private >> `Trainer._evaluate(...)`. ([#1260](https://github.com/PyTorchLightning/pytorch-lightning/pull/1260))
- Simplify the PL examples structure (shallower and more readable) ([#1247](https://github.com/PyTorchLightning/pytorch-lightning/pull/1247))

### Deprecated

Expand Down
71 changes: 62 additions & 9 deletions pl_examples/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,67 @@
# Examples
This folder has 4 sections:
This folder has 3 sections:

### Basic examples
These show the most common use of Lightning for either CPU or GPU training.
## Basic Examples
Use these examples to test how lightning works.

### Domain templates
These are templates to show common approaches such as GANs and RL.
#### Test on CPU
```bash
python cpu_template.py
```

### Full examples
Contains examples demonstrating ImageNet training, Semantic Segmentation, etc.
---
#### Train on a single GPU
```bash
python gpu_template.py --gpus 1
```

### Multi-node examples
These show how to run jobs on a GPU cluster using lightning.
---
#### DataParallel (dp)
Train on multiple GPUs using DataParallel.

```bash
python gpu_template.py --gpus 2 --distributed_backend dp
```

---
#### DistributedDataParallel (ddp)

Train on multiple GPUs using DistributedDataParallel
```bash
python gpu_template.py --gpus 2 --distributed_backend ddp
```

---
#### DistributedDataParallel+DP (ddp2)

Train on multiple GPUs using DistributedDataParallel + dataparallel.
On a single node, uses all GPUs for 1 model. Then shares gradient information
across nodes.
```bash
python gpu_template.py --gpus 2 --distributed_backend ddp2
```

## Multi-node example

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
To run this demo do the following:

1. Log into the jumphost node of your SLURM-managed cluster.
2. Create a conda environment with Lightning and a GPU PyTorch version.
3. Choose a script to submit

### DDP
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
```bash
sbatch ddp_job_submit.sh YourEnv
```

### DDP2
Submit this job to run with a different implementation of DistributedDataParallel.
In this version, each node acts like DataParallel but syncs across nodes like DDP.
```bash
sbatch ddp2_job_submit.sh YourEnv
```

## Domain templates
These are templates to show common approaches such as GANs and RL.
2 changes: 1 addition & 1 deletion pl_examples/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ def optimize_on_cluster(hyperparams):
"""

from .basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

__all__ = [
'LightningTemplateModel'
Expand Down
27 changes: 25 additions & 2 deletions pl_examples/basic_examples/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Basic Examples
## Basic Examples
Use these examples to test how lightning works.

#### Test on CPU
Expand Down Expand Up @@ -36,4 +36,27 @@ On a single node, uses all GPUs for 1 model. Then shares gradient information
across nodes.
```bash
python gpu_template.py --gpus 2 --distributed_backend ddp2
```
```


# Multi-node example

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
To run this demo do the following:

1. Log into the jumphost node of your SLURM-managed cluster.
2. Create a conda environment with Lightning and a GPU PyTorch version.
3. Choose a script to submit

#### DDP
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
```bash
sbatch ddp_job_submit.sh YourEnv
```

#### DDP2
Submit this job to run with a different implementation of DistributedDataParallel.
In this version, each node acts like DataParallel but syncs across nodes like DDP.
```bash
sbatch ddp2_job_submit.sh YourEnv
```
2 changes: 1 addition & 1 deletion pl_examples/basic_examples/cpu_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import torch

import pytorch_lightning as pl
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

SEED = 2334
torch.manual_seed(SEED)
Expand Down
2 changes: 1 addition & 1 deletion pl_examples/basic_examples/gpu_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import torch

import pytorch_lightning as pl
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

SEED = 2334
torch.manual_seed(SEED)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import torch

import pytorch_lightning as pl
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

SEED = 2334
torch.manual_seed(SEED)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import torch

import pytorch_lightning as pl
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

SEED = 2334
torch.manual_seed(SEED)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
To run this template just do:
python gan.py
python generative_adversarial_net.py
After a few epochs, launch TensorBoard to see the images being generated at every batch:
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
import torch.nn.functional as F
import torchvision.transforms as transforms
from PIL import Image
from models.unet.model import UNet
from torch.utils.data import DataLoader, Dataset

import pytorch_lightning as pl
from pl_examples.models.unet import UNet


class KITTI(Dataset):
Expand Down
Empty file.

This file was deleted.

File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,47 @@
import torch.nn.functional as F


class UNet(nn.Module):
"""
Architecture based on U-Net: Convolutional Networks for Biomedical Image Segmentation
Link - https://arxiv.org/abs/1505.04597
Parameters:
num_classes (int): Number of output classes required (default 19 for KITTI dataset)
bilinear (bool): Whether to use bilinear interpolation or transposed
convolutions for upsampling.
"""

def __init__(self, num_classes=19, bilinear=False):
super().__init__()
self.layer1 = DoubleConv(3, 64)
self.layer2 = Down(64, 128)
self.layer3 = Down(128, 256)
self.layer4 = Down(256, 512)
self.layer5 = Down(512, 1024)

self.layer6 = Up(1024, 512, bilinear=bilinear)
self.layer7 = Up(512, 256, bilinear=bilinear)
self.layer8 = Up(256, 128, bilinear=bilinear)
self.layer9 = Up(128, 64, bilinear=bilinear)

self.layer10 = nn.Conv2d(64, num_classes, kernel_size=1)

def forward(self, x):
x1 = self.layer1(x)
x2 = self.layer2(x1)
x3 = self.layer3(x2)
x4 = self.layer4(x3)
x5 = self.layer5(x4)

x6 = self.layer6(x5, x4)
x6 = self.layer7(x6, x3)
x6 = self.layer8(x6, x2)
x6 = self.layer9(x6, x1)

return self.layer10(x6)


class DoubleConv(nn.Module):
"""
Double Convolution and BN and ReLU
Expand Down
21 changes: 0 additions & 21 deletions pl_examples/multi_node_examples/README.md

This file was deleted.

Empty file.

0 comments on commit 22bedf9

Please sign in to comment.