Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify examples structure #1247

Merged
merged 6 commits into from
Apr 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

- Added parity test between a vanilla MNIST model and lightning model ([#1284](https://github.com/PyTorchLightning/pytorch-lightning/pull/1284))
- Added parity test between a vanilla MNIST model and lightning model ([#1284](https://github.com/PyTorchLightning/pytorch-lightning/pull/1284))
- Added parity test between a vanilla RNN model and lightning model ([#1351](https://github.com/PyTorchLightning/pytorch-lightning/pull/1351))
- Added Reinforcement Learning - Deep Q-network (DQN) lightning example ([#1232](https://github.com/PyTorchLightning/pytorch-lightning/pull/1232))
- Added support for hierarchical `dict` ([#1152](https://github.com/PyTorchLightning/pytorch-lightning/pull/1152))
- Added `TrainsLogger` class ([#1122](https://github.com/PyTorchLightning/pytorch-lightning/pull/1122))
Expand Down Expand Up @@ -41,6 +42,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Give warnings for unimplemented required lightning methods ([#1317](https://github.com/PyTorchLightning/pytorch-lightning/pull/1317))
- Enhanced load_from_checkpoint to also forward params to the model ([#1307](https://github.com/PyTorchLightning/pytorch-lightning/pull/1307))
- Made `evaluate` method private >> `Trainer._evaluate(...)`. ([#1260](https://github.com/PyTorchLightning/pytorch-lightning/pull/1260))
- Simplify the PL examples structure (shallower and more readable) ([#1247](https://github.com/PyTorchLightning/pytorch-lightning/pull/1247))

### Deprecated

Expand Down
71 changes: 62 additions & 9 deletions pl_examples/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,67 @@
# Examples
This folder has 4 sections:
This folder has 3 sections:

### Basic examples
These show the most common use of Lightning for either CPU or GPU training.
## Basic Examples
Use these examples to test how lightning works.

### Domain templates
These are templates to show common approaches such as GANs and RL.
#### Test on CPU
```bash
python cpu_template.py
```

### Full examples
Contains examples demonstrating ImageNet training, Semantic Segmentation, etc.
---
#### Train on a single GPU
```bash
python gpu_template.py --gpus 1
```

### Multi-node examples
These show how to run jobs on a GPU cluster using lightning.
---
#### DataParallel (dp)
Train on multiple GPUs using DataParallel.

```bash
python gpu_template.py --gpus 2 --distributed_backend dp
```

---
#### DistributedDataParallel (ddp)

Train on multiple GPUs using DistributedDataParallel
```bash
python gpu_template.py --gpus 2 --distributed_backend ddp
```

---
#### DistributedDataParallel+DP (ddp2)

Train on multiple GPUs using DistributedDataParallel + dataparallel.
On a single node, uses all GPUs for 1 model. Then shares gradient information
across nodes.
```bash
python gpu_template.py --gpus 2 --distributed_backend ddp2
```

## Multi-node example

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
To run this demo do the following:

1. Log into the jumphost node of your SLURM-managed cluster.
2. Create a conda environment with Lightning and a GPU PyTorch version.
3. Choose a script to submit

### DDP
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
```bash
sbatch ddp_job_submit.sh YourEnv
Borda marked this conversation as resolved.
Show resolved Hide resolved
```

### DDP2
Submit this job to run with a different implementation of DistributedDataParallel.
In this version, each node acts like DataParallel but syncs across nodes like DDP.
```bash
sbatch ddp2_job_submit.sh YourEnv
```

## Domain templates
These are templates to show common approaches such as GANs and RL.
2 changes: 1 addition & 1 deletion pl_examples/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ def optimize_on_cluster(hyperparams):

"""

from .basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

__all__ = [
'LightningTemplateModel'
Expand Down
27 changes: 25 additions & 2 deletions pl_examples/basic_examples/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Basic Examples
## Basic Examples
Use these examples to test how lightning works.

#### Test on CPU
Expand Down Expand Up @@ -36,4 +36,27 @@ On a single node, uses all GPUs for 1 model. Then shares gradient information
across nodes.
```bash
python gpu_template.py --gpus 2 --distributed_backend ddp2
```
```


# Multi-node example

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
To run this demo do the following:

1. Log into the jumphost node of your SLURM-managed cluster.
2. Create a conda environment with Lightning and a GPU PyTorch version.
3. Choose a script to submit

#### DDP
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
```bash
sbatch ddp_job_submit.sh YourEnv
```

#### DDP2
Submit this job to run with a different implementation of DistributedDataParallel.
In this version, each node acts like DataParallel but syncs across nodes like DDP.
```bash
sbatch ddp2_job_submit.sh YourEnv
```
2 changes: 1 addition & 1 deletion pl_examples/basic_examples/cpu_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import torch

import pytorch_lightning as pl
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

SEED = 2334
torch.manual_seed(SEED)
Expand Down
2 changes: 1 addition & 1 deletion pl_examples/basic_examples/gpu_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import torch

import pytorch_lightning as pl
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

SEED = 2334
torch.manual_seed(SEED)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import torch

import pytorch_lightning as pl
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

SEED = 2334
torch.manual_seed(SEED)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import torch

import pytorch_lightning as pl
from pl_examples.basic_examples.lightning_module_template import LightningTemplateModel
from pl_examples.models.lightning_template import LightningTemplateModel

SEED = 2334
torch.manual_seed(SEED)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
To run this template just do:
python gan.py
python generative_adversarial_net.py

After a few epochs, launch TensorBoard to see the images being generated at every batch:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
import torch.nn.functional as F
import torchvision.transforms as transforms
from PIL import Image
from models.unet.model import UNet
from torch.utils.data import DataLoader, Dataset

import pytorch_lightning as pl
from pl_examples.models.unet import UNet


class KITTI(Dataset):
Expand Down
Empty file.

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,47 @@
import torch.nn.functional as F


class UNet(nn.Module):
"""
Architecture based on U-Net: Convolutional Networks for Biomedical Image Segmentation
Link - https://arxiv.org/abs/1505.04597

Parameters:
num_classes (int): Number of output classes required (default 19 for KITTI dataset)
bilinear (bool): Whether to use bilinear interpolation or transposed
convolutions for upsampling.
"""

def __init__(self, num_classes=19, bilinear=False):
super().__init__()
self.layer1 = DoubleConv(3, 64)
self.layer2 = Down(64, 128)
self.layer3 = Down(128, 256)
self.layer4 = Down(256, 512)
self.layer5 = Down(512, 1024)

self.layer6 = Up(1024, 512, bilinear=bilinear)
self.layer7 = Up(512, 256, bilinear=bilinear)
self.layer8 = Up(256, 128, bilinear=bilinear)
self.layer9 = Up(128, 64, bilinear=bilinear)

self.layer10 = nn.Conv2d(64, num_classes, kernel_size=1)

def forward(self, x):
x1 = self.layer1(x)
x2 = self.layer2(x1)
x3 = self.layer3(x2)
x4 = self.layer4(x3)
x5 = self.layer5(x4)

x6 = self.layer6(x5, x4)
x6 = self.layer7(x6, x3)
x6 = self.layer8(x6, x2)
x6 = self.layer9(x6, x1)

return self.layer10(x6)


class DoubleConv(nn.Module):
"""
Double Convolution and BN and ReLU
Expand Down
21 changes: 0 additions & 21 deletions pl_examples/multi_node_examples/README.md

This file was deleted.

Empty file.