Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured results (train loop only. val loop separate PR) (PR 2/5) #2615

Merged
merged 174 commits into from
Jul 20, 2020
Merged
Show file tree
Hide file tree
Changes from 153 commits
Commits
Show all changes
174 commits
Select commit Hold shift + click to select a range
4d2b081
r
williamFalcon Jul 15, 2020
4513eb3
r
williamFalcon Jul 15, 2020
5cc01ff
r
williamFalcon Jul 15, 2020
c747f80
patched optimizer closure with sr
williamFalcon Jul 15, 2020
ed9b4f8
patched optimizer closure with sr
williamFalcon Jul 15, 2020
3f98d18
patched optimizer closure with sr
williamFalcon Jul 15, 2020
8352a56
added train step structured result
williamFalcon Jul 15, 2020
7d453d4
added train step structured result
williamFalcon Jul 15, 2020
23403ce
added train step structured result
williamFalcon Jul 15, 2020
9bc77ac
added train step structured result
williamFalcon Jul 15, 2020
9cdaf8f
added train step structured result
williamFalcon Jul 16, 2020
9309b9e
added train step structured result
williamFalcon Jul 16, 2020
8241130
added train step structured result
williamFalcon Jul 16, 2020
ceeedc2
added train step structured result
williamFalcon Jul 16, 2020
6bbe6d8
added train step structured result
williamFalcon Jul 16, 2020
c56ea84
added train step structured result
williamFalcon Jul 17, 2020
9df0e16
added train step structured result
williamFalcon Jul 18, 2020
331fe55
added train step structured result
williamFalcon Jul 18, 2020
0c8afc0
added train step structured result
williamFalcon Jul 18, 2020
7f8d72d
added train step structured result
williamFalcon Jul 18, 2020
8254f8e
added train step structured result
williamFalcon Jul 18, 2020
7c8a32e
added train step structured result
williamFalcon Jul 18, 2020
6927313
added train step structured result
williamFalcon Jul 18, 2020
1c78a5b
added train step structured result
williamFalcon Jul 18, 2020
5c67538
added train step structured result
williamFalcon Jul 18, 2020
870259c
added train step structured result
williamFalcon Jul 18, 2020
4a36ea5
added autoreduce for train step
williamFalcon Jul 18, 2020
4837cf4
added auto reduce on train
williamFalcon Jul 18, 2020
6378985
added auto reduce on train
williamFalcon Jul 18, 2020
f7f654a
added auto reduce on train
williamFalcon Jul 18, 2020
73fd54b
added auto reduce on train
williamFalcon Jul 18, 2020
b3f38c2
added auto reduce on train
williamFalcon Jul 18, 2020
6e64ba9
added auto reduce on train
williamFalcon Jul 18, 2020
1b24903
added hooks
williamFalcon Jul 18, 2020
eae4d6b
added hooks
williamFalcon Jul 18, 2020
6102c8a
added hooks
williamFalcon Jul 19, 2020
ccd08ed
added hooks
williamFalcon Jul 19, 2020
804e9c8
added hooks
williamFalcon Jul 19, 2020
2736c70
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
e09bcfc
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
55eb02c
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
b13d62b
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
7006138
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
715a634
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
c26a92e
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
e59b04c
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
a90a719
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
43b3724
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
a45b808
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
c50c74e
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
758b5d8
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
9812318
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
c1af222
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
07c4f42
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
28f2c40
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
15c8f55
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
f8209b2
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
882437e
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
874f4a2
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
968b17e
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
176b884
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
7b4be6a
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
92c0323
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
8bb3b19
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
1c69301
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
4a6f193
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
44d9a0a
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
26c8d3c
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
f71c797
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
6bce9d0
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
e3226f3
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
18630d2
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
4a9659c
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
90939a8
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
33cd21b
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
8d32a7a
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
f4a0a6f
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
78d335d
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
0faf912
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
e0ce316
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
947dc70
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
1bd96c1
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
886a094
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
e891e4b
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
9870739
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
50ddc5a
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
bb9dce7
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
df2b590
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
68ab130
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
d846ff3
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
e97722a
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
36319cd
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
06ecec5
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
3a6c132
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
98e11e3
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
7782abe
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
2d4eccf
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
3bbd01f
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
77c28b0
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
0f18073
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
a0dd29b
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
36c10b5
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
e113e2c
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
f7d2841
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
94ea112
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
f5b4259
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
b0f6590
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
5e1882b
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
96f9689
finished tests for structured results on train epoch
williamFalcon Jul 19, 2020
5cd90fe
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
4ebd847
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
4c3b03a
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
7886bcb
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
3c2f53c
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
0f3807f
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
14db086
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
7452cd5
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
21ffdf2
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
ee31889
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
5942821
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
2692014
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
042bcb6
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
8a449e6
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
e7d1585
cache
Borda Jul 20, 2020
1d34947
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
bfde914
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
5397ca6
Merge branch 'st' of https://github.com/PyTorchLightning/pytorch-ligh…
williamFalcon Jul 20, 2020
2e7b68d
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
71712d8
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
d93845e
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
1ec8992
Update pytorch_lightning/callbacks/early_stopping.py
williamFalcon Jul 20, 2020
e272a59
Update pytorch_lightning/callbacks/early_stopping.py
williamFalcon Jul 20, 2020
6c8f2e5
Update pytorch_lightning/callbacks/early_stopping.py
williamFalcon Jul 20, 2020
4ce032f
Update pytorch_lightning/callbacks/model_checkpoint.py
Borda Jul 20, 2020
b7ea0cc
Update pytorch_lightning/core/step_result.py
Borda Jul 20, 2020
3e7af00
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
5323d59
Merge branch 'st' of https://github.com/PyTorchLightning/pytorch-ligh…
williamFalcon Jul 20, 2020
b4ad5c2
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
a2c2401
Apply suggestions from code review
Borda Jul 20, 2020
7102cef
Apply suggestions from code review
Borda Jul 20, 2020
12ef3b0
Apply suggestions from code review
Borda Jul 20, 2020
2116a61
simple
Borda Jul 20, 2020
fd5445d
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
6a63fe0
simple
Borda Jul 20, 2020
d650daf
simple
Borda Jul 20, 2020
d5db845
Merge branch 'st' of https://github.com/PyTorchLightning/pytorch-ligh…
williamFalcon Jul 20, 2020
6abb73a
revert
Borda Jul 20, 2020
54862cc
Merge branch 'st' of https://github.com/PyTorchLightning/pytorch-ligh…
williamFalcon Jul 20, 2020
6333f21
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
f8591b4
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
ff088ca
Update tests/base/deterministic_model.py
williamFalcon Jul 20, 2020
595fd4b
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
fdf0e43
Merge branch 'st' of https://github.com/PyTorchLightning/pytorch-ligh…
williamFalcon Jul 20, 2020
74cd049
docstring typos
awaelchli Jul 20, 2020
fe91a2b
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
42b3724
Merge branch 'st' of https://github.com/PyTorchLightning/pytorch-ligh…
williamFalcon Jul 20, 2020
7dfda42
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
4f48912
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
de5cbb9
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
6ccf0cc
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
e671a79
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
f3ee6c2
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
be99f0a
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
d767547
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
de16f8a
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
072cb09
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
ea26761
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
f74e3b0
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
cab63d4
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
db26566
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
a1010dd
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
30e17aa
finished tests for structured results on train epoch
williamFalcon Jul 20, 2020
a7f0544
Update pytorch_lightning/core/step_result.py
williamFalcon Jul 20, 2020
704d201
Update pytorch_lightning/overrides/data_parallel.py
williamFalcon Jul 20, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci-testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,9 @@ jobs:
uses: actions/cache@v1
with:
path: ${{ steps.pip-cache.outputs.dir }}
key: ${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}-pip-${{ hashFiles('requirements/base.txt') }}-${{ hashFiles('requirements/extra.txt') }}
key: ${{ runner.os }}-pip-${{ matrix.python-version }}-${{ matrix.requires }}-pip-${{ hashFiles('requirements/base.txt') }}-${{ hashFiles('requirements/extra.txt') }}
restore-keys: |
${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}-pip-
${{ runner.os }}-pip-${{ matrix.python-version }}-${{ matrix.requires }}-pip-

- name: Install dependencies
run: |
Expand Down
5 changes: 4 additions & 1 deletion pytorch_lightning/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,17 @@
from pytorch_lightning.trainer import Trainer
from pytorch_lightning.utilities.seed import seed_everything
from pytorch_lightning import metrics
from pytorch_lightning.core.step_result import TrainResult, EvalResult

__all__ = [
'Trainer',
'LightningModule',
'Callback',
'data_loader',
'seed_everything',
'metrics'
'metrics',
'EvalResult',
'TrainResult'
]

# necessary for regular bolts imports. Skip exception since bolts is not always installed
Expand Down
24 changes: 24 additions & 0 deletions pytorch_lightning/callbacks/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,30 @@ def on_sanity_check_end(self, trainer, pl_module):
"""Called when the validation sanity check ends."""
pass

def on_train_epoch_start(self, trainer, pl_module):
"""Called when the train epoch begins."""
pass

def on_train_epoch_end(self, trainer, pl_module):
"""Called when the train epoch ends."""
pass

def on_validation_epoch_start(self, trainer, pl_module):
"""Called when the val epoch begins."""
pass

def on_validation_epoch_end(self, trainer, pl_module):
"""Called when the val epoch ends."""
pass

def on_test_epoch_start(self, trainer, pl_module):
"""Called when the test epoch begins."""
pass

def on_test_epoch_end(self, trainer, pl_module):
"""Called when the test epoch ends."""
pass

def on_epoch_start(self, trainer, pl_module):
"""Called when the epoch begins."""
pass
Expand Down
22 changes: 22 additions & 0 deletions pytorch_lightning/callbacks/early_stopping.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"""
from copy import deepcopy

import os
import numpy as np
import torch
import torch.distributed as dist
Expand Down Expand Up @@ -140,12 +141,33 @@ def on_sanity_check_end(self, trainer, pl_module):
def on_validation_end(self, trainer, pl_module):
self._run_early_stopping_check(trainer, pl_module)

def on_train_epoch_end(self, trainer, pl_module):
# early stopping can also work in the train loop when there is no val loop and when using structured results
should_check_early_stop = False
train_es_key = 'early_stop_on'
williamFalcon marked this conversation as resolved.
Show resolved Hide resolved
if trainer.callback_metrics.get(train_es_key, None) is not None:
self.monitor = train_es_key
should_check_early_stop = True

val_es_key = 'val_early_stop_on'
if trainer.callback_metrics.get(val_es_key, None) is not None:
self.monitor = val_es_key
should_check_early_stop = True

if should_check_early_stop:
self._run_early_stopping_check(trainer, pl_module)

def _run_early_stopping_check(self, trainer, pl_module):
logs = trainer.callback_metrics

if not self._validate_condition_metric(logs):
return # short circuit if metric not present

current = logs.get(self.monitor)

# when in dev debugging
trainer.dev_debugger.track_early_stopping_history(current)

if not isinstance(current, torch.Tensor):
current = torch.tensor(current, device=pl_module.device)

Expand Down
21 changes: 15 additions & 6 deletions pytorch_lightning/callbacks/model_checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,11 @@ def _del_model(self, filepath):
if os.path.isfile(filepath):
os.remove(filepath)

def _save_model(self, filepath):
def _save_model(self, filepath, trainer, pl_module):

# in debugging, track when we save checkpoints
trainer.dev_debugger.track_checkpointing_history(filepath)

# make paths
os.makedirs(os.path.dirname(filepath), exist_ok=True)

Expand Down Expand Up @@ -270,6 +274,11 @@ def on_validation_end(self, trainer, pl_module):

metrics = trainer.callback_metrics
epoch = trainer.current_epoch

# support structured results
if metrics.get('checkpoint_on') is not None:
self.monitor = 'checkpoint_on'

if self.save_top_k == 0:
# no models are saved
return
Expand All @@ -281,7 +290,7 @@ def on_validation_end(self, trainer, pl_module):

if self.save_last:
filepath = os.path.join(self.dirpath, self.prefix + 'last.ckpt')
self._save_model(filepath)
self._save_model(filepath, trainer, pl_module)

filepath = self.format_checkpoint_name(epoch, metrics)
version_cnt = 0
Expand All @@ -306,7 +315,7 @@ def on_validation_end(self, trainer, pl_module):
f'Can save best model only with {self.monitor} available, skipping.', RuntimeWarning
)
elif self.check_monitor_top_k(current):
self._do_check_save(filepath, current, epoch)
self._do_check_save(filepath, current, epoch, trainer, pl_module)
elif self.verbose > 0:
log.info(f'\nEpoch {epoch:05d}: {self.monitor} was not in top {self.save_top_k}')

Expand All @@ -315,9 +324,9 @@ def on_validation_end(self, trainer, pl_module):
log.info(f'\nEpoch {epoch:05d}: saving model to {filepath}')

assert trainer.global_rank == 0, 'tried to make a checkpoint from non global_rank=0'
self._save_model(filepath)
self._save_model(filepath, trainer, pl_module)

def _do_check_save(self, filepath, current, epoch):
def _do_check_save(self, filepath, current, epoch, trainer, pl_module):
# remove kth

del_list = []
Expand All @@ -343,7 +352,7 @@ def _do_check_save(self, filepath, current, epoch):
f'\nEpoch {epoch:05d}: {self.monitor} reached'
f' {current:0.5f} (best {self.best_model_score:0.5f}), saving model to'
f' {filepath} as top {self.save_top_k}')
self._save_model(filepath)
self._save_model(filepath, trainer, pl_module)

for cur_path in del_list:
if cur_path != filepath:
Expand Down
36 changes: 36 additions & 0 deletions pytorch_lightning/core/hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,42 @@ def on_epoch_end(self) -> None:
"""
# do something when the epoch ends

def on_train_epoch_start(self) -> None:
Borda marked this conversation as resolved.
Show resolved Hide resolved
"""
Called in the training loop at the very beginning of the epoch.
"""
# do something when the epoch starts

def on_train_epoch_end(self) -> None:
"""
Called in the training loop at the very end of the epoch.
"""
# do something when the epoch ends

def on_validation_epoch_start(self) -> None:
"""
Called in the validation loop at the very beginning of the epoch.
"""
# do something when the epoch starts

def on_validation_epoch_end(self) -> None:
"""
Called in the training loop at the very end of the epoch.
awaelchli marked this conversation as resolved.
Show resolved Hide resolved
"""
# do something when the epoch ends

def on_test_epoch_start(self) -> None:
"""
Called in the training loop at the very beginning of the epoch.
awaelchli marked this conversation as resolved.
Show resolved Hide resolved
"""
# do something when the epoch starts

def on_test_epoch_end(self) -> None:
"""
Called in the training loop at the very end of the epoch.
awaelchli marked this conversation as resolved.
Show resolved Hide resolved
"""
# do something when the epoch ends

def on_pre_performance_check(self) -> None:
"""
Called at the very beginning of the validation loop.
Expand Down
Loading