Skip to content

Commit

Permalink
Merge branch 'master' into bugfix/scale-batch-size-datamodule
Browse files Browse the repository at this point in the history
  • Loading branch information
awaelchli committed Mar 6, 2021
2 parents 01d5e1e + 34b733b commit 7185815
Show file tree
Hide file tree
Showing 73 changed files with 878 additions and 849 deletions.
10 changes: 3 additions & 7 deletions .github/workflows/ci_test-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,24 +27,20 @@ jobs:
run: |
conda info
conda list
# adjust versions according installed Torch version
python ./requirements/adjust_versions.py requirements/extra.txt
python ./requirements/adjust_versions.py requirements/examples.txt
pip install --requirement requirements/devel.txt --upgrade-strategy only-if-needed
pip list
- name: Pull checkpoints from S3
# todo: consider adding coma caching, but ATM all models have less then 100KB
run: |
# enter legacy and update checkpoints from S3
cd legacy
curl https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip --output checkpoints.zip
unzip -o checkpoints.zip
ls -l checkpoints/
# todo: require proper fix in docker image
- name: Hotfix dependency
run: |
pip install torchtext==0.6.0 -U
shell: bash

- name: Tests
run: |
# NOTE: run coverage on tests does not propagare faler status for Win, https://github.com/nedbat/coveragepy/issues/1003
Expand Down
22 changes: 9 additions & 13 deletions .github/workflows/ci_test-full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -104,20 +104,17 @@ jobs:
HOROVOD_WITHOUT_MXNET: 1
HOROVOD_WITHOUT_TENSORFLOW: 1
run: |
# python -m pip install --upgrade --user pip
pip install --requirement requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --quiet --upgrade
pip install --requirement ./requirements/devel.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --quiet --upgrade
python --version
pip --version
# python -m pip install --upgrade --user pip
pip install --requirement requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
# adjust versions according installed Torch version
python ./requirements/adjust_versions.py requirements/extra.txt
python ./requirements/adjust_versions.py requirements/examples.txt
pip install --requirement ./requirements/devel.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --upgrade
pip list
shell: bash

# todo: require proper fix in docker image
- name: Hotfix dependency
run: |
pip install torchtext==0.6.0 -U
shell: bash

- name: Reinstall Horovod if necessary
if: runner.os != 'windows'
env:
Expand All @@ -143,10 +140,9 @@ jobs:
# NOTE: do not include coverage report here, see: https://github.com/nedbat/coveragepy/issues/1003
coverage run --source pytorch_lightning -m pytest pytorch_lightning tests -v --durations=50 --junitxml=junit/test-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}.xml
# todo: put this back just when TorchVision can download datasets
#- name: Examples
# run: |
# python -m pytest pl_examples -v --durations=10
- name: Examples
run: |
python -m pytest pl_examples -v --durations=10
- name: Upload pytest test results
uses: actions/upload-artifact@v2
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/docs-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,15 @@ jobs:
- name: Install dependencies
run: |
python --version
pip --version
# remove Horovod from requirements
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('horovod')] ; open(fname, 'w').writelines(lines)"
# python -m pip install --upgrade --user pip
pip install --requirement requirements.txt --upgrade-strategy only-if-needed --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --quiet
pip install --requirement requirements/extra.txt
pip install --requirement requirements/loggers.txt
pip install --requirement requirements/docs.txt
python --version
pip --version
pip list
shell: bash

Expand Down Expand Up @@ -84,12 +84,12 @@ jobs:
- name: Install dependencies
run: |
pip install --requirement requirements.txt --upgrade-strategy only-if-needed --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --quiet
python --version
pip --version
# pip install --requirement requirements.txt --upgrade-strategy only-if-needed --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --quiet
pip install --requirement requirements/docs.txt
# install Texlive, see https://linuxconfig.org/how-to-install-latex-on-ubuntu-20-04-focal-fossa-linux
sudo apt-get update && sudo apt-get install -y texlive-latex-extra dvipng texlive-pictures
python --version
pip --version
pip list
shell: bash

Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/events-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,6 @@ jobs:
id: extend
- name: Publish CUDA to Docker Hub
# ToDo: extend also building for Nightly from pip
if: matrix.pytorch_version < 1.8
# publish master/release
uses: docker/build-push-action@v2
with:
Expand Down
50 changes: 37 additions & 13 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,31 +15,46 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Added `checkpoint` parameter to callback's `on_save_checkpoint` hook ([#6072](https://github.com/PyTorchLightning/pytorch-lightning/pull/6072))


- Added `RunningStage.SANITY_CHECKING` ([#4945](https://github.com/PyTorchLightning/pytorch-lightning/pull/4945))


- Added `TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}` ([#4945](https://github.com/PyTorchLightning/pytorch-lightning/pull/4945))


- Added `LightningEnvironment` for Lightning-specific DDP ([#5915](https://github.com/PyTorchLightning/pytorch-lightning/pull/5915))


- Added arg to `self.log` that enables users to give custom names when dealing with multiple dataloaders ([#6274](https://github.com/PyTorchLightning/pytorch-lightning/pull/6274))


- Added no return warning to predict ([#6139](https://github.com/PyTorchLightning/pytorch-lightning/pull/6139))


### Changed

- Changed the order of `backward`, `step`, `zero_grad` to `zero_grad`, `backward`, `step` ([#6147](https://github.com/PyTorchLightning/pytorch-lightning/pull/6147))
- Renamed `pytorch_lightning.callbacks.swa` to `pytorch_lightning.callbacks.stochastic_weight_avg` ([#6259](https://github.com/PyTorchLightning/pytorch-lightning/pull/6259))


- Changed default for DeepSpeed CPU Offload to False, due to prohibitively slow speeds at smaller scale ([#6262](https://github.com/PyTorchLightning/pytorch-lightning/pull/6262))
- Refactor `RunningStage` and `TrainerState` usage ([#4945](https://github.com/PyTorchLightning/pytorch-lightning/pull/4945))


- Renamed `pytorch_lightning.callbacks.swa` to `pytorch_lightning.callbacks.stochastic_weight_avg` ([#6259](https://github.com/PyTorchLightning/pytorch-lightning/pull/6259))
- Changed `trainer.evaluating` to return `True` if validating or testing ([#4945](https://github.com/PyTorchLightning/pytorch-lightning/pull/4945))


### Deprecated


- Deprecated `trainer.running_sanity_check` in favor of `trainer.sanity_checking` ([#4945](https://github.com/PyTorchLightning/pytorch-lightning/pull/4945))


### Removed

- Removed support for passing a bool value to `profiler` argument of Trainer ([#6164](https://github.com/PyTorchLightning/pytorch-lightning/pull/6164))


- Removed no return warning from val/test step ([#6139](https://github.com/PyTorchLightning/pytorch-lightning/pull/6139))


- Removed passing a `ModelCheckpoint` instance to `Trainer(checkpoint_callback)` ([#6166](https://github.com/PyTorchLightning/pytorch-lightning/pull/6166))


Expand Down Expand Up @@ -74,37 +89,46 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Expose DeepSpeed loss parameters to allow users to fix loss instability ([#6115](https://github.com/PyTorchLightning/pytorch-lightning/pull/6115))


- Fixed epoch level schedulers not being called when `val_check_interval < 1.0` ([#6075](https://github.com/PyTorchLightning/pytorch-lightning/pull/6075))
- Fixed `ModelPruning(make_pruning_permanent=True)` pruning buffers getting removed when saved during training ([#6073](https://github.com/PyTorchLightning/pytorch-lightning/pull/6073))


- Fixed multiple early stopping callbacks ([#6197](https://github.com/PyTorchLightning/pytorch-lightning/pull/6197))
- Fixed `trainer.test` from `best_path` hangs after calling `trainer.fit` ([#6272](https://github.com/PyTorchLightning/pytorch-lightning/pull/6272))


- Fixed `ModelPruning(make_pruning_permanent=True)` pruning buffers getting removed when saved during training ([#6073](https://github.com/PyTorchLightning/pytorch-lightning/pull/6073))
- Fixed duplicate logs appearing in console when using the python logging module ([#5509](https://github.com/PyTorchLightning/pytorch-lightning/pull/5509), [#6275](https://github.com/PyTorchLightning/pytorch-lightning/pull/6275))


- Fixed incorrect usage of `detach()`, `cpu()`, `to()` ([#6216](https://github.com/PyTorchLightning/pytorch-lightning/pull/6216))
- Fixed `SingleTPU` calling `all_gather` ([#6296](https://github.com/PyTorchLightning/pytorch-lightning/pull/6296))


- Fixed LBFGS optimizer support which didn't converge in automatic optimization ([#6147](https://github.com/PyTorchLightning/pytorch-lightning/pull/6147))
- Fixed DP reduction with collection ([#6324](https://github.com/PyTorchLightning/pytorch-lightning/pull/6324))


- Prevent `WandbLogger` from dropping values ([#5931](https://github.com/PyTorchLightning/pytorch-lightning/pull/5931))
- Fixed PyTorch Profiler with `emit_nvtx` ([#6260](https://github.com/PyTorchLightning/pytorch-lightning/pull/6260))


- Fixed `trainer.test` from `best_path` hangs after calling `trainer.fit` ([#6272](https://github.com/PyTorchLightning/pytorch-lightning/pull/6272))


- Fixed duplicate logs appearing in console when using the python logging module ([#5509](https://github.com/PyTorchLightning/pytorch-lightning/pull/5509), [#6275](https://github.com/PyTorchLightning/pytorch-lightning/pull/6275))
## [1.2.2] - 2021-03-02

### Added

- Fixed `SingleTPU` calling `all_gather` ([#6296](https://github.com/PyTorchLightning/pytorch-lightning/pull/6296))
- Added `checkpoint` parameter to callback's `on_save_checkpoint` hook ([#6072](https://github.com/PyTorchLightning/pytorch-lightning/pull/6072))

### Changed

- Fixed error thrown when using valid distributed mode in multi node ([#6297](https://github.com/PyTorchLightning/pytorch-lightning/pull/6297)
- Changed the order of `backward`, `step`, `zero_grad` to `zero_grad`, `backward`, `step` ([#6147](https://github.com/PyTorchLightning/pytorch-lightning/pull/6147))
- Changed default for DeepSpeed CPU Offload to False, due to prohibitively slow speeds at smaller scale ([#6262](https://github.com/PyTorchLightning/pytorch-lightning/pull/6262))

### Fixed

- Fixed DP reduction with collection ([#6324](https://github.com/PyTorchLightning/pytorch-lightning/pull/6324))
- Fixed epoch level schedulers not being called when `val_check_interval < 1.0` ([#6075](https://github.com/PyTorchLightning/pytorch-lightning/pull/6075))
- Fixed multiple early stopping callbacks ([#6197](https://github.com/PyTorchLightning/pytorch-lightning/pull/6197))
- Fixed incorrect usage of `detach()`, `cpu()`, `to()` ([#6216](https://github.com/PyTorchLightning/pytorch-lightning/pull/6216))
- Fixed LBFGS optimizer support which didn't converge in automatic optimization ([#6147](https://github.com/PyTorchLightning/pytorch-lightning/pull/6147))
- Prevent `WandbLogger` from dropping values ([#5931](https://github.com/PyTorchLightning/pytorch-lightning/pull/5931))
- Fixed error thrown when using valid distributed mode in multi node ([#6297](https://github.com/PyTorchLightning/pytorch-lightning/pull/6297)


- Fixed an issue with `Tuner.scale_batch_size` not finding the batch size attribute in the datamodule ([#5968](https://github.com/PyTorchLightning/pytorch-lightning/pull/5968))
Expand Down
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ recursive-include docs/source/_static/images/general/ pl_overview* tf_* tutorial

# Include the Requirements
recursive-include requirements *.txt
recursive-exclude requirements *.sh
recursive-exclude requirements *.sh *.py
include requirements.txt
include pyproject.toml

Expand Down
21 changes: 9 additions & 12 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,11 +71,6 @@ jobs:
python -c "import torch ; mgpu = torch.cuda.device_count() ; assert mgpu >= 2, f'GPU: {mgpu}'"
displayName: 'Env details'
# todo: require proper fix in docker image
- bash: |
pip install torchtext==0.7 -U
displayName: 'HotFix'
- bash: |
wget https://pl-public-data.s3.amazonaws.com/legacy/checkpoints.zip -P legacy/
unzip -o legacy/checkpoints.zip -d legacy/
Expand All @@ -100,10 +95,12 @@ jobs:
python -m pytest benchmarks -v --maxfail=2 --durations=0
displayName: 'Testing: benchmarks'
# todo: put this back just when TorchVision can download datasets
#- bash: |
# python -m pytest pl_examples -v --maxfail=2 --durations=0
# python setup.py install --user --quiet
# bash pl_examples/run_ddp-example.sh
# pip uninstall -y pytorch-lightning
# displayName: 'Examples'
- bash: |
python -m pytest pl_examples -v --maxfail=2 --durations=0
python setup.py install --user --quiet
bash pl_examples/run_ddp-example.sh
cd pl_examples/basic_examples
bash submit_ddp_job.sh
bash submit_ddp2_job.sh
pip uninstall -y pytorch-lightning
displayName: 'Examples'
2 changes: 2 additions & 0 deletions dockers/base-conda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,12 @@ ENV \

COPY ./requirements/extra.txt requirements-extra.txt
COPY ./requirements/test.txt requirements-test.txt
COPY ./requirements/adjust_versions.py requirements_adjust_versions.py

RUN \
pip list | grep torch && \
python -c "import torch; print(torch.__version__)" && \
python requirements_adjust_versions.py requirements-extra.txt && \
# Install remaining requirements
pip install -r requirements-extra.txt --no-cache-dir && \
pip install -r requirements-test.txt --no-cache-dir && \
Expand Down
8 changes: 5 additions & 3 deletions dockers/base-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -94,12 +94,14 @@ RUN \
# eventualy use pre-release
#pip install "torch==${PYTORCH_VERSION}.*" --pre && \
# set particular PyTorch version
python -c "import re ; fname = 'requirements.txt' ; req = re.sub(r'torch[>=]+[\d\.]+', 'torch==${PYTORCH_VERSION}.*', open(fname).read()) ; open(fname, 'w').write(req)" && \
python ./requirements/adjust_versions.py requirements.txt ${PYTORCH_VERSION} && \
python ./requirements/adjust_versions.py requirements/extra.txt ${PYTORCH_VERSION} && \
python ./requirements/adjust_versions.py requirements/examples.txt ${PYTORCH_VERSION} && \
# Install all requirements
# todo: find a way how to install nightly PT version
# --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu${cuda_ver[0]}${cuda_ver[1]}/torch_nightly.html
pip install -r requirements/devel.txt --no-cache-dir && \
rm -rf requirements*
rm -rf requirements.* requirements/

RUN \
# install DALI, needed for examples
Expand All @@ -113,7 +115,7 @@ RUN \

RUN \
# install DeepSpeed from source.
# todo: swap to pypi release once DeepSpeed releases a new version.
# todo: swap to pypi release once DeepSpeed releases a new version >= 0.3.10
pip install deepspeed@git+https://github.com/microsoft/DeepSpeed@ec8b1cb

RUN \
Expand Down
1 change: 1 addition & 0 deletions dockers/base-xla/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ RUN \
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'fairscale' not in line] ; open(fname, 'w').writelines(lines)" && \
# drop TorchVision as it was installed with XLA
python -c "fname = 'requirements/examples.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('torchvision')] ; open(fname, 'w').writelines(lines)" && \
python ./requirements/adjust_versions.py ./requirements/extra.txt && \
pip install --requirement ./requirements/devel.txt --no-cache-dir && \
cd .. && \
rm -rf pytorch-lightning && \
Expand Down
15 changes: 8 additions & 7 deletions dockers/release/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,15 @@ COPY ./ ./pytorch-lightning/
RUN \
# Disable cache
#conda install "pip>20.1" && \
#pip config set global.cache-dir false && \
if [ -z $LIGHTNING_VERSION ] ; then \
pip install ./pytorch-lightning --no-cache-dir ; \
if [ ! -z "$LIGHTNING_VERSION" ] ; then \
rm -rf pytorch-lightning ; \
else \
rm -rf pytorch-lightning ; \
pip install https://github.com/PyTorchLightning/pytorch-lightning/archive/${LIGHTNING_VERSION}.zip --no-cache-dir ; \
fi
wget https://github.com/PyTorchLightning/pytorch-lightning/archive/${LIGHTNING_VERSION}.zip --progress=bar:force:noscroll ; \
unzip ${LIGHTNING_VERSION}.zip ; \
mv pytorch-lightning-*/ pytorch-lightning ; \
rm *.zip ; \
fi && \
pip install ./pytorch-lightning["extra"] --no-cache-dir && \
rm -rf pytorch-lightning

RUN python --version && \
pip --version && \
Expand Down
Loading

0 comments on commit 7185815

Please sign in to comment.