This repository has been archived by the owner on Sep 28, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update local pytorch-lightning master (#1)
* Add hint in docs for how to use shared memory (#6036) * Prevent flickering progress bar (#6009) * add padding * fix * fix * Update pytorch_lightning/callbacks/progress.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * updated based on suggestion * changelog * add test * fix pep8 * resolve test * fix code format Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: tchaton <thomas@grid.ai> * Fix Wrapping optimizers upon assignment (#6006) * Update properties.py * pep8 * [Bugfix] Apply untoggle_optimizer when result is None (#5983) * update changelog * apply untoggle_optimizer when result is None * update tests * still return loss sometimes * Update CHANGELOG.md Co-authored-by: deng-cy <dcy1996@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * remove outdated info (#6032) * DeepSpeed Integration (#5954) * Add initial deepspeed changes * Address code review * Move static method outside of function * Fixes * Add missing annotation * Remove seed setting * Doc changes * Doc changes, add address reviews * Fix docs * Try fixing issue by moving to torch adam * Clean up check * Changes, better APIs! * Add wrapper, swap to git install revision * Add special test * Add warning * Address review * Add better disclaimer * Turn off ZeRO for testing due to compilation * Add description on modifying parameters via the plugin * Doc strings clear * Small doc fixes * Fix hash, reduce test * Added CI change * Move to azure pipeline * Fix test name * Add missing flag * Remove sudo... * Try conda instead * Swap to conda base * Try suggested install * Apply suggestions from code review * Apply suggestions from code review * Revert "Apply suggestions from code review" This reverts commit 41cca05a * Revert "Apply suggestions from code review" This reverts commit e06ec29e * Remove setter * Address most review * Move out function, remove DeepSpeed from requirements * Install deepspeed/mpi4py within container * Use special tests, move to master commit for deepspeed * Export path * Force compile to happen first * Remove! * Debugging ninja * Fix error in optimizer step logic * Attempt to fix symbolic link * Reverse to aid debugging * Export path again * Clean up mess * var * Revert "var" This reverts commit 3450eaca * Address review, add todo * Add note about unsupported functionality Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> * Trainer only references accelerator (#6039) * Trainer only references accelerator where it can * Move teardown to the trainer, as it is reponsible for the accelerator * Address code review for deepspeed (#6042) * [feat] Add Trainer(stochastic_weight_avg=True/False) (#6038) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * [CI] Move DeepSpeed into CUDA image, remove DeepSpeed install from azure (#6043) * Move to CUDA image * Remove deepspeed install as deepspeed now in the cuda image * Remove path setting, as ninja should be in the container now * drop deprecated result object 1/n (#5005) * ro1 * ro2 * Add option for weight tying on TPU's (#5441) * added on_post_move_to_device * added tests * docs and refactors * Update tests/backends/test_tpu_backend.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update docs/source/tpu.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update docs/source/tpu.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update docs/source/tpu.rst Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/decorators.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/hooks.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * moved weight sharing module back to test updated tpu available * add count to warning * fix doctest * import trainer in doctest * import trainer in doctest * do not test code as no TPU device * param count to layer count * formatting * update docs * update import * update * resolve tests * remove legacy accelerator Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Your Name <you@example.com> * Delete tests.helpers.TrialMNISTDataModule (#5999) * Remove TrialMNISTDataModule * Allow using TrialMNIST in the MNISTDataModule * Update tests/helpers/datasets.py * Fix: Allow hashing of metrics with lists in their state (#5939) * Fix: Allow hashing of metrics with lists in their state * Add test case and modify semantics of Metric __hash__ in order to be compatible with structural equality checks * Fix pep8 style issue Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * et al. (#6050) * et al. * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: chaton <thomas@grid.ai> * [ModelPruning] Add missing attribute with use_global_unstructured=False and verbose (#6045) * fix/test quant (#6040) * fix/test quant * ... * --- * Add descriptions to accelerator broadcast function/clean up all_gather (#6044) * Add descriptions to accelerator broadcast function/clean up all_gather * Remove todo * Add before_batch_transfer and after_batch_transfer hooks (#3671) * add hooks * comment * docs * add tests * make it private * fix tests * docs * chlog * testcode * codefactor * fix doctest * fix doctest * suggestions * is always overriden * pep and BoringModel * BoringModel * docs * docs * docs * fix * rebase * rebase * suggestions * docs * suggestions * try fix docs * docs * update name * yapf * docs * rebase * yapf * Make parallel devices optional across all plugins (#6051) * Make parallel devices optional across all plugins so that they can be instantiated * Add any to types to capture vars passed in * clarify gpu / process (#6049) * Fix docs typo (#6055) Put .test() in code blocks * Docs for Pruning, Quantization, and SWA (#6041) Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Replace .get_model() with explicit .lightning_module (#6035) * rename get_model -> lightning_module * update references to get_model * pep8 * add proper deprecation * remove outdated _get_reference_model * fix cyclic import * rename accelerator_backend -> accelerator (#6034) * rename accelerator backend * rename new additions from master * add proper deprecation * pep8 * warning match * add missing warning type * fix flake8 for new plugins (#5951) * flake8 * fix cyclic import * isort * fix docs links (#6057) * Add warnings to on_before/after_batch_transfer hooks (#6059) * Add warnings to hooks * Add default idx to prevent signature change in the future * Nothing to see here * Add default val to transfer_batch_to_device hook * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Revert "Add default val to transfer_batch_to_device hook" This reverts commit 5c6a68f2 Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * v1.2.0rc2 (#6063) * v1.2.0rc2 * chlogs * chlogs * format * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update auto-opt docs (#6037) * fix docs * update on comments * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * rm comment * Update docs/source/common/lightning_module.rst Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> * Raise AttributeError in lightning_getattr and lightning_setattr when attribute not found (#6024) * Empty commit * Raise AttributeError instead of ValueError * Make functions private * Update tests * Add match string * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * lightning to Lightning Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * default sched (#6062) * v1.2.0 (#6065) * v1.2.0 * docs * add Azure tags trigger (#6066) * add Azure tags trigger * fix * mnodes * pypi azure badges - tags (#6068) * pypi azure badges - tags * pep8 * id * continue towards 1.3 (#6069) * Fix amp autocast (#6080) * precision fixes * add amp test model * fix test * revert * move assert to training step * fix test * fix test * remove unrelated changes * add changelog * remove unused import * add sanity check on nb available GPUs (#6092) * consistent behavior for reduce method across all Plugins (#6011) * reduction docs * docs for abstract base method * make mean the default * add preliminary chlog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * [Hot Fix] Give priority to plugins to set distributed mode, and then accelerator (#6089) * Give priority to plugins to set distributed mode, and then accelerator * Add CHANGELOG.md * Update CHANGELOG.md * Remove very scary line * Ensure we set cluster environment after slurm configured if necessary * Simplify the fix with a reset Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Enable ZeRO tests for CI, fix to/half function calls (#6070) * Enable ZeRO optimization, and make sure that the lightning module hook is called when we move to half precision * Added test, update to function * Expose DeepSpeed FP16 parameters due to loss instability (#6115) * Expose deepspeed config parameters to init function due to instability in parameters * See if tests can run on normal CI, without special tests * Add changelog * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Collapse 2 DeepSpeed tests (#6108) * fix amp/apex misconfiguration error for cpu (#6107) * fix weird test * fix apex plugin test * fix raise * cpu test * fix type * add changelog * Update Contributing Guide (#6118) * Update Contributing Guide * update docs * Minor fixes/improvements in Metric docs (#6114) * Fix wrong render * Improve classification metrics docs * Improve other domain metrics docs * Change the structure level in the docs * Avoid printing ModelCheckpoint log with monitor=None and verbose=True (#6109) * Feature/5275 clean progress bar print (#5470) * Trainer.test should return only test metrics (#5214) * resolve bug * merge tests * Fix metric state reset (#5273) * Fix metric state reset * Fix test * Improve formatting Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai> * print() method added to ProgressBar * printing alongside progress bar added to LightningModule.print() * LightningModule.print() method documentation updated * ProgressBarBase.print() stub added * stub * add progress bar tests * fix isort * Progress Callback fixes * test_metric.py duplicate DummyList removed * PEP and isort fixes * CHANGELOG updated * test_progress_bar_print win linesep fix * test_progress_bar.py remove whitespaces * Update CHANGELOG.md Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Tadej Svetina <tadej.svetina@gmail.com> Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai> Co-authored-by: Alexander Snorkin <Alexander.Snorkin@acronis.com> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * mini refactor for _running_stage access (#5724) * running stage * circular import * running stage cleanup * fix unused import * fix running stage access * add return type * Revert "add return type" This reverts commit 65b0fe269c6547213e34b6a88b97bee31cdfe8c7. * try fix typing * Add specifics around DeepSpeed docs (#6142) * Be more specific with DeepSpeed compatibility * Better wording * Ensure accelerator is valid if running interactively (#5970) Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> * fixing miss-leading tested acc values (#5876) * fixing tested values * . * tests * yapf * softmax * hvd * rename * lr * duplicate * drop * classif * rm EvalModel * Revert "rm EvalModel" This reverts commit 6c3fb39ebe0c4bfb52357bccfd050438f2c0f31c. * update tests * fix * azure * azure * self * cpu * Apply suggestions from code review Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> * Update CHANGELOG (#6156) * prune deprecated profiler as bool (#6164) * prune profiler * chlog * prune deprecated Trainer arg `enable_pl_optimizer` (#6163) * prune enable_pl_optimizer * prune automatic_optimization * Prune deprecated metrics for 1.3 (#6161) * prune deprecated metrics for 1.3 * isort / yapf * [Bugfix] Fixed epoch level schedulers not being called when val_check_interval < 1.0 (#6075) * fix bug * fix tests * changelog * fix pep8 * fix tests * fix and add some tests * add test for rlop * chlog * Update CHANGELOG.md Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> * Prune deprecated checkpoint arguments (#6162) * prune prefix * prune mode=auto * chlog * Prune deprecated EarlyStopping(mode='auto') (#6167) Co-authored-by: Roger Shieh <sh.rog@protonmail.ch> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Fix typo (#6178) * Update issue template to use discussions for questions (#6155) * add issue config * remove question template * update URL * Update README.md * Update README.md Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update .github/ISSUE_TEMPLATE/config.yml Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update with GitHub Discussions (#6186) * Update gpu warning (#6181) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Kaushik Bokka <kaushikbokka@gmail.com> * type accelerators (#6148) * Fix for multiple callbacks (#6197) * Fix for multiple callbacks * Add CHANGELOG.md * Remove old params * Skip tests on windows using ddp * Change name of the variable to not clash with should stop, which is separate * Apply suggestions from code review * Fix params Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Add checkpoint parameter to on_save_checkpoint (#6072) Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * Document exceptions in loggers (#6171) * Document exceptions in loggers * minor formatting * docstring changed in comet.py * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Prune deprecated Trainer(checkpoint_callback=ModelCheckpoint()) (#6166) * fix parallel devices return type & add copyright (#6215) * Add mypy typing to precision plugins. (#6149) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * apply_func.py: from torchtext.legacy.data import Batch (#6211) * Update apply_func.py The name Batch is no longer located under torchtext.data --Error message-- File "/home/daniel/py38/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 25, in <module> from torchtext.data import Batch ImportError: cannot import name 'Batch' from 'torchtext.data' (/home/daniel/py38/lib/p ython3.8/site-packages/torchtext/data/__init__.py) You can fix this by changing line line 28 to: from torchtext.legacy.data import Batch * Update apply_func.py * Update apply_func.py * Update apply_func.py * Update apply_func.py * Update apply_func.py * fix(wandb): prevent WandbLogger from dropping values (#5931) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Prune deprecated hparams setter (#6207) * document exceptions for metrics/regression (#6202) Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Prajakta Phadke <pphadke@iu.edu> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * simplify skip-if tests >> 0/n (#5920) * skipif + yapf + isort * tests * docs * pp * update (#6237) * Document Exceptions in profilers (#6229) * docstring changes in profilers * minor changes in profilers.py * Call `optimizer.zero_grad()` before backward inside closure in AutoOpt (#6147) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> * Fix for incorrect usage of detach(), cpu(), to() (#6216) * Fix for incorrect detach/cpu calls (#6214) * Fix incorrect use of detach(), to(), and cpu(), #6214 * Fix incorrect use of detach() and cpu(), #6214 * update pr * add typing * chlog * more... * revert on module * update on comments * revert changes on model Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> * add skipif warpper (#6258) * cleaning SWA (#6259) * rename * if * test * chlog * Remove opt from manual_backward in docs (#6267) * switch agents pool (#6270) * docstring changes in tuner (#6264) * docstring changes in tuner * added full stop * Disable CPU Offload as default for DeepSpeed (#6262) * Change default for CPU offload to false for best throughput/memory efficiency * Add changelog * default Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * split profilers (#6261) * Refactor: skipif for multi - gpus 1/n (#6266) * ngpus * gpu * isort * pt * flake8 * Improved EarlyStopping.patience documentation (#6278) * Improved early stopping documentation * Changed to 120 column format * doc * doc * doc Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> * Refactor: skipif for Windows 2/n (#6268) * win * isort * flake8 * fix duplicate console logging bug v2 (#6275) Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Refactor: skipif for AMPs 3/n (#6293) * args * native * apex * isort * [fix] Ensure we check deepspeed/sharded in multinode DDP (#6297) * Ensure we check deepspeed/sharded in multinode * Add CHANGELOG.md * Add CHANGELOG.md * Drop mock, use actual multi-gpu node * unfreeze torchtext version (#6302) * Add possibility for custom naming when using multiple dataloaders (#6274) * try to fix imports for parsing (#6256) * try to fix imports * legacy 1.2.1 * Refactor: Runif for TPU and Horovod 5/n (#6301) * TPU * horovod * extra * fix * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * doc Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Refactor: runif for spec 6/6 (#6307) * special * rpc * Add fairscale & deepspeed to skipif 4/n (#6281) * add fairscale & windows to skipif * add deepspeed to runif * fairscale * deepspeed * flake8 Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> * [bugfix] TPU test hangs to barrier on 1 process (#6272) * update * resolve flake8 * update * update * update changelog * update * resolve flake8 Co-authored-by: Your Name <you@example.com> * prune duplicite test in optim (#6312) * Simplify test for AMP plugins (#6311) * AMP * fuse * yapf * Fix ModelPruning(make_pruning_permanent=True) buffers getting removed when saved during training (#6073) Co-authored-by: chaton <thomas@grid.ai> * [bugfix] TPU + all_gather + SingleTPU shouldn't call xm.all_gather (#6296) * resolve an issue with TPU * update * add changelog * drop unused variable in API (#6308) * drop unused pl model in ckpt * irelevant * on_evaluation_batch_start * evaluation_epoch_end * attach_datamodule * hotfix for PT1.6 and torchtext (#6323) * ci: azure reinstall torchtext * move * todos * 0.6.0 * skip examples * formatter * skip * todo * Apply suggestions from code review * [fix] Use training type plugin hook when saving (FSDP 1/n) (#6321) * Rely on training type plugin when saving * Add better typing to training type plugin * leaving lezwon (#6347) * Add `tests/utilities/test_parsing.py` (#4460) * Create branch tests/4400_parsing * Rename test file for parsing.py * Fix lightning_hasattr * Fix lightning_hasattr * Fix lightning_setattr * Add empty lines and remove rubbish spaces * Raise AttributeError not ValueError * Use getattr in hasattr * Remove rubbish spaces * Fix getattr * Fix by flake8 * Add tests for str_to_bool_or_str * Fix by flake8 * Add tests for str_to_bool * Add tests for is_picklable * Add tests for clean_namespace * Fix typo * Fix lightning_getattr * Add tests for AttributeDict * Add tests for flatten_dict * Fix by flake8 * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Apply isort * Revert "Apply suggestions from code review" * Define unpicklable_function outside * Add comment to test_clean_namespace * Add tests for parse_class_init_keys * Add tests for get_init_args and collect_init_args * Share objects across the tests Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Add ignore param to save_hyperparameters (#6056) * add ignore param to save_hyperparameters * add docstring for ignore * add type for frame object * Update pytorch_lightning/core/lightning.py Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Update pytorch_lightning/core/lightning.py Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * fix whitespace * Update pytorch_lightning/core/lightning.py Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Parametrize tests * Update pytorch_lightning/core/lightning.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update pytorch_lightning/core/lightning.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * seq * fix docs * Update lightning.py * Update lightning.py * fix docs errors * add example keyword * update docstring Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Fix when _stable_1d_sort to work when n >= N (#6177) * Fix when _stable_1d_sort to work when n >= N * Apply suggestions Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> * Update docs on arg train_dataloader in fit (#6076) * add to docs * update docs * Apply suggestions from code review * Update pytorch_lightning/core/hooks.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * nested loaders * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * shorten text length * Update pytorch_lightning/core/hooks.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * missing tests default_root_dir=tmpdir (#6314) * default_root_dir=tmpdir * miss * Document exception for metrics/classification (#6190) * document exception for metrics/classification * minor formatting fixes * fix trailing whitespaces * document exception for metrics * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * [Fix] Call clip gradients if clip val greater than 0 (#6330) * Call clip gradients if clip val greater than 0 * format * Format * Move to top of file * [bugfix] Check LightningOptimizer doesn't delete optimizer hooks (#6305) * update * resolve bug * docstring changes in accelerators (#6327) * docstring changes in accelerators * docstrings moved * whitespaces removed * PEP8 correction[1] * [bugfix] Perform reduction for dict in training_step and DP (#6324) * fix * update * update * add changelog * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update tests/accelerators/test_dp.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update changelog Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * introduce default cluster environment for lightning-specific ddp (#5915) * handle distributed_sampler_kwargs * move emptying cache to accelertor * fix a few tests * restoring the result from subprocess * fix queue.get() order for results * add missing "block_backward_sync" context manager * add missing "block_backward_sync" context manager * fix sync_batchnorm * fix supported gpu-ids for tuple * fix clip gradients and inf recursion * accelerator selection: added cluster_environment plugin * fix torchelastic test * fix reduce early stopping decision for DDP * fix tests: callbacks, conversion to lightning optimizer * fix lightning optimizer does not pickle * fix setting benchmark and deterministic option * fix slurm amp test * fix prepare_data test and determine node_rank * fix retrieving last path when testing * remove obsolete plugin argument * fix test: test_trainer_config * fix torchscript tests * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * abstract the cluster plugins * default plugin * integrate default environment * fix property * adapt tests * adjust test * fix world size access * base cluster env * revert rebase errors * revert rebase errors * missing import * revert unrelated change * remove unused cluster local rank * remove unrelated changes * fix unrelated changes * fix pep8 * remove unused var * reset permissions * ypaf * test default environment * test torchelastic environment * world size as int * tests for slurm environment * changelog * test comments * remove unintended change * keep master port fixed after it is generated * test random master port * yapf * add missing default environment * move helper function * rename default environment * rename * rename * yapf * Update pytorch_lightning/plugins/environments/lightning_environment.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * spawn -> create Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * [bugfix] Resolve memory leak for evaluation (#6326) * resolve bug * resolve flake8 * revert name * Update changelog for v1.2.2 (#6325) * update changelog for v1.2.2 * ckpr 1.2.2 Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> * CI: fix examples - patch download MNIST (#6357) * patch download * CI * isort * extra * [bug] Fix Pytorch profiler with emit_nvtx (#6260) * resolve bug * update changelog * Update tests/trainer/test_trainer.py * Update pytorch_lightning/profiler/profilers.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * resolve comments * resolve flake8 Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix importing torchtext batch (#6365) * copy torchtext batch * update * rev * rev * give a more complete GAN example (#6294) * Refactor RunningStage usage in advance of implementing Trainer.validate() (#4945) * Update code Co-authored-by: EliaCereda * More property updates * Move properties. Introduce trainer._fitting * Use trainer.fitting * Fix reset dataloaders * Unused code * RunningStage.SANITY_CHECKING * Use setters * Fix bugs * Fix bugs * TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING} * Fix bugs * Fix bugs * Fix tests * Update CHANGELOG. Add deprecation warning. Fix tests * Unused imports * Optional trainer * More deprecation. More refactoring * Correct version * Use properties * Address comments * flake8 * Missed renamings * Typo * is -> == It is recommended to use for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str * Also for tests * Typo * Address @tchaton's comments * PEP8 * Correct property * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Remove called sanity check Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * require: adjust versions (#6363) * adjust versions * release * manifest * pep8 * CI * fix * build * Use f-"""-string in a Trainer comment (#6377) * Use f-"""-string * Add r * Use Trainer. * r -> noqa: W605 * Remove no return warning from val/test step (#6139) * remove warning * auto_opt * chlog * auto_opt * no_warning_call * rm old code * add warning for predict * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Fix manual optimization in pl_example (#6373) * Fix automatic_optimization * Fix automatic_optimization * Uncomment fairscale * Update Sharded test with RunIf (#6384) * Remove optimizer_idx arg in manual optimization (#6093) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> * [doc] Improve Multiple Val/Test Dataloaders with simultaneous batches option (#6320) * improve doc to describe how to combine batches of multiple test and val dataloaders simultaneously * fix typo Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * use paramref Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * [doc] Fix closure in manual optimization (#6374) * Fix manual optimization docs * Fix typo. Thanks @import-antigravity * Fix ModelCheckpoint(monitor=None, save_last=True) not saving checkpoints (#6136) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Update TBLogger docs (#6315) * Update tensorboard.py * Update logging.rst * pep8 * Update logging.rst * Update logging.rst * Apply suggestions from code review * add code sample * Update logging.rst Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Fix trainer not resetting lightning_optimizers (#6372) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update python version (#6399) * Fix AttributeError: 'NoneType' object has no attribute 'finalize' on TPU (#6221) * Fix bug Fix AttributeError: 'NoneType' object has no attribute 'finalize' * Update CHANGELOG.md * deleted a period * Update CHANGELOG.md Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Update CHANGELOG.md * Update pytorch_lightning/plugins/training_type/tpu_spawn.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Run CI (#6402) * Pass {fit,validate,test,predict} to setup() and teardown() (#6386) * fix dp reduction test (#6404) * fix * update * fix * move the class outside * Add check for verbose attribute of ModelCheckpoint (#6419) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * fixed bug where tuner would not tune lr if also tuning batch_size (#4688) * fixed bug where tuner would not tune lr if also tuning batch_size * added a '+1' to computing the smoothed loss. This maintains the behavior for the smoothed loss as before the bug fix * pep8 fix * add changelog Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update (#6403) * fix logger creating directory structure too early in DDP (#6380) * fix * add simple test * fix imports * add changelog * tighter test with on_fit_start hook closer to the dispatch call * move class inside test f unction * add a comment * Typing for tests 1/n (#6313) * typing * yapf * typing * [changelog] Update Changelog on release v1.2.3 (#6444) * update changelog * legacy 1.2.3 Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> * Improve DummyLogger (#6398) * fix dummy logger * docs * update docs * add changelog * add none return annotation * return empty string for name, version * Raise an exception if check_val_every_n_epoch is not an integer (#6411) * raise an exception if check_val_every_n_epoch is not an integer * remove unused object * add type hints * add return type * update exception message * update exception message * Set find unused parameters to True by default to fix breaking compatibility (#6438) * Set find unused parameters to True by default to fix breaking models, add suggestion to re-enable * Add changelog * [bug] All_gather support tensor on cpu (#6416) * add test * update changelog * update * rename function * [Fix] Ensure we set the default device before initializing deepspeed (#6460) * Ensure we set the default device before initializing deepspeed * Add CHANGELOG.md * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * Remove redundant test (#6466) * Add Trainer.validate(…) method to run one validation epoch (#4948) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Allow user to disable the automatic formatting of checkpoint file names. (#6277) * cleaning SWA (#6259) * rename * if * test * chlog * Remove opt from manual_backward in docs (#6267) * switch agents pool (#6270) * Allow user to disable the automatic formatting of checkpoint file names. * Added changelog entry. * Made flake8 happy. * Applied review suggestion: quotes for special characters in docstring Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Fixed example in docstring. * Fixed syntax error in docstring. Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Hotfix for torchvision (#6476) * cover subproc coverage (#6477) * argparse: Add use_argument_group=True (#6088) * argparse: Add inplace option Replicate in GAN model * datamodule: Deduplicate logic w/ argparser utilities * Update pl_examples/domain_templates/generative_adversarial_net.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Keep docstrings * Correct name * Whitespace * Consistency * fix weird type stuff * try alt - use_argument_group * fix syntax + lint * fix ci errs * fix ci * change examples... still failing w/ "unrecognized arguments: --batch_size" * address review * mnist_datamodule: add some docstrings * argparse: check cls or cls.__init__ for param didn't capture issue, but meh * fix lint * fix no-doc edge case * address review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> * Disable batch transfer in DP mode (#6098) * add exceptions and test * hook * fix * clean up * clean up * regex * regex * docs * rev * comment and docs * chlog * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Apply suggestions from code review Co-authored-by: chaton <thomas@grid.ai> * Monkey-patch device count * docs * pep * api_change Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> * remove obsolete todo in pl_examples (#6475) * [feat] Support iteration-based checkpointing in model checkpoint callback (#6146) * Update model_checkpoint.py * add tests * Update model_checkpoint.py * Update test_model_checkpoint.py * fix tests * every_n_batches * Update test_model_checkpoint.py * defaults * rm tests * Update model_checkpoint.py * Update test_model_checkpoint.py * Prune deprecated metrics for 1.3 (#6161) * prune deprecated metrics for 1.3 * isort / yapf * Update model_checkpoint.py * add tests * defaults * Update CHANGELOG.md * pre-commit * Update model_checkpoint.py * update defaults * Update test_remove_1-5.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * fix tests * Update test_model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update test_model_checkpoint.py * ckpt-callback * Update test_model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * validation-end * Update model_checkpoint.py * Update test_model_checkpoint.py * Update test_model_checkpoint.py * Update test_model_checkpoint.py * Update test_model_checkpoint.py * clarify-names - Make names explicit as to which hooks they apply to - Use step instead of batch for consistency with global step * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * Update model_checkpoint.py * mutual-exclusive Make every_n_train_steps and every_n_val_epochs mutually exclusive * fix-default-0 * Update CHANGELOG.md * formatting * make-private make attributes private to the class * rebase Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update xla version (#6464) * Remove unused mixin attributes (#6487) * Remove unused mixing attributes * Missing import * [doc] Update the order of zero_grad and backward (#6478) * Fix zero_grad in docs * Fix zero_grad in docs * Fix tuner.scale_batch_size not finding batch size attribute when using datamodule (#5968) * Update docs for limit_predict_batches (#6507) * add docs and minor updates * docs * fraction * [bug] Update broadcast + reduce decision ModelCheckpoint] (#6410) * resolve bug * update * update changelog * update PR * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * add todo * resolve issues * resolve flake8 * update * add coverage for reduce * wip * restore back to brodbact * remove test.py * resolve flake8 * update * check world size * resolve test * update * use pytorch version when defined * update on comments * update on comments * flake8 * resolve bugs * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update * update * update * update * remove test * update * resolve flake8 * update * update * update * proxy * update * update * resolve typo * prune * update parallel * update Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Handle torch.jit scripted modules in layer summary (#6511) * CI: resume testing with py3.8 (#6516) * testing on python 3.8 * req * document exceptions for metrics/functional (#6273) * document exceptions for metrics/functional * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> * Mean Average Precision metric for Information Retrieval (1/5) (#5032) * init information retrieval metrics * changed retrieval metrics names, expanded arguments and fixed typo * added 'Retrieval' prefix to metrics and fixed conflict with already-present 'average_precision' file * improved code formatting * pep8 code compatibility * features/implemented new Mean Average Precision metrics for Information Retrieval + doc * fixed pep8 compatibility * removed threshold parameter and fixed typo on types in RetrievalMAP and improved doc * improved doc, put first class-specific args in RetrievalMetric and transformed RetrievalMetric in abstract class * implemented tests for functional and class metric. fixed typo when input tensors are empty or when all targets are False * fixed typos in doc and changed torch.true_divide to torch.div * fixed typos pep8 compatibility * fixed types in long division in ir_average_precision and example in mean_average_precision * RetrievalMetric states are not lists and _metric method accepts predictions and targets for easier extension * updated CHANGELOG file * added '# noqa: F401' flag to not used imports * added double space before '# noqa: F401' flag * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * change get_mini_groups in get_group_indexes * added checks on target inputs * minor refactoring for code cleanness * split tests over exception raising in separate function && refactored test code into multiple functions * fixed pep8 compatibility * implemented suggestions of @SkafteNicki * fixed imports for isort and added types annontations to functions in test_map.py * isort on test_map and fixed typing * isort on retrieval and on __init__.py and utils.py in metrics package * fixed typo in pytorch_lightning/metrics/__init__.py regarding code style * fixed yapf compatibility * fixed yapf compatibility * fixed typo in doc Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * CI: Azure publish results (#6514) * deprecate metrics pkg (#6505) * deprecate metrics * examples * req * docs * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * pep8 Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * [test] lr_find with bs_scale (#6422) * init test: test_lr_find_with_bs_scale * Update test_lr_finder.py * remove gpu req * try boring model * custom boring model * pep8 * fix typo * Update test_lr_finder.py * typo * typo * Update DeepSpeed docs (#6528) * Clean up docs and add some explicitness around stages * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * fix attribute access in LightningModule.toggle_optimizer (#6513) * Update hook lifecycle (#6538) * Update hook lifecycle * Update docs/source/common/lightning_module.rst * Prune metrics base classes 2/n (#6530) * base class * extensions * chlog * _stable_1d_sort * _check_same_shape * _input_format_classification_one_hot * utils * to_onehot * select_topk * to_categorical * get_num_classes * reduce * class_reduce * tests * Custom Plugin is_distributed (#6537) * return from plugin * dont return for tpu * refactor reading env defaults (#6510) * change tests * fix * test * _defaults_from_env_vars Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Prune metric: helpers and inputs 3/n (#6547) * _basic_input_validation * _check_shape_and_type_consistency * _check_num_classes_binary * _check_num_classes_mc * _check_num_classes_ml * _check_top_k * _check_classification_inputs * _input_format_classification * _reduce_stat_scores * DataType * rest * flake8 * chlog * prune warning & deprecation wrapper (#6540) * docs * wrapper * test * count * flake8 * Add outputs param for `on_val/test_epoch_end` hooks (#6120) * add outputs param for on_val/test_epoch_end hooks * update changelog * fix warning message * add custom call hook * cache logged metrics * add args to docstrings * use warning cache * add utility method for param in sig check * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update docstring * add test for eval epoch end hook * add types and replace model ref * add deprecation test * fix test fx name * add model hooks warning * add old signature model to tests * add clear warning cache * sopport args param * update tests * add tests for model hooks * code suggestions * add signature utils * fix pep8 issues * fix pep8 issues * fix outputs issue * fix tests * code fixes * fix validate test * test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * [doc] Add Zero Grad `set_to_none=True` trick (#6548) * add trick to doc * update * update path * Update docs/source/benchmarking/performance.rst Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * fix deprecation wrapper & tests (#6553) * fix deprecation wrapper & tests * flake8 * prune metric: accuracy 4/n (#6515) * prune accuracy * chlog * flake8 * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * wrap * test * test * fix Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Prune metrics: AUC & AUROC (#6572) * class: AUC AUROC * func: auc auroc * format * tests * [doc] Update Dict Train Loader doc. (#6579) * update doc * update example * Prune metrics: precision & recall 6/n (#6573) * avg precision * precision * recall * curve * tests * chlog * isort * fix * Update Changelog for v1.2.4 (#6581) * Update changelog for v1.2.4 * lagacy v1.2.4 * prune duplicates from changelog Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> * [Fix] Move init dist connection into the setup function (#6506) * Move connection setup into the setup function. Call setup hook after we set up the accelerator * Added CHANGELOG.md * fix setup order in callback test * fix input arguments in test * Mock distributed function, remove protection to turn into training type hook * Remove import * Add missing mock, ensure custom plugin does not create children process * Skip test on windows * Update deepspeed to init connection in setup * Do not initialize distributed module * Move DeepSpeed tests to special tests since dist communication is being set up * Special the test to see if this fixes CI * Delete accelerator connector test to see if its causing build to fail * Delete deepspeed test * Revert "Delete accelerator connector test to see if its causing build to fail" This reverts commit edde60b8 * Revert "Delete deepspeed test" This reverts commit 9d317429 * Reverse hook * Reverse setup hooks to debug again * Add todo so i know where i left off * For single device move in pre_dispatch after setup function * Add additional model to device hook if any additional parameters have been set * See if we can enable deepspeed tests * Revert "See if we can enable deepspeed tests" This reverts commit b5450def * See if this hook approach works * Introduce new granular hooks * Remove import, fix tpu spawn by moving the function to setup * Added missing special test Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Fix all_gather for tpu_cores=8 (#6587) * Update Gradient Clipping for TPU Accelerator (#6576) * NGC container PoC (#6187) * add NVIDIA flows * push * pull * ... * extras * ci prune * fix * tag * . * list * Automatically set sync_batchnorm for training_type_plugin (#6536) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Roger Shieh <sh.rog@protonmail.ch> Co-authored-by: Kaushik Bokka <kaushikbokka@gmail.com> * Prune metrics: other classification 7/n (#6584) * confusion_matrix * iou * f_beta * hamming_distance * stat_scores * tests * flake8 * chlog * fixing examples (#6600) * try Azure * -e * path * Add AMP for validation, prediction and testing (#6565) * Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog * Add trainer.predict config validation (#6543) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Add DDP Spawn being default for Multi GPUs (#6292) * Move profiler tests (#6619) * drop mypy from .pre-commit-config.yaml (#6542) * Clean utilities/argparse and add missing tests (#6607) * Allow training type plugin to delay optimizer creation (FSDP 2/n) (#6331) * Allow training_type_plugin to delay optimizer configure * Add missing references to trainer, add a CPU accelerator based test * Add teardown method to BaseProfiler. (#6370) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * refactoring setup (#6590) * refactoring setup * . * docs * flake8 * hotfix: mock examples (#6632) * mock examples * drop from GA * [refactor] Add setup to profilers + _run_stage_setup to trainer 2/5 (#6633) * add setup * update * updates on comment * Minor changes * Extra import * Docs Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> * fix comparing versions (#6434) * fix comparing versions * chlog * . * ... * datasets * Prune metrics: regression 8/n (#6636) * explained_variance * tests * mean_absolute_error * mean_squared_error * mean_relative_error * mean_squared_log_error * chlog * Prune metyrics: regression 9/n (#6637) * psnr * r2score * ssim * chlog * Refactor base profilers 3/5 (#6621) Co-authored-by: tchaton <thomas@grid.ai> * prune metrics: info retrieval (#6649) * Flash predict step (#6577) * add predict_step * Update predict_loop.py * Update trainer.py * Update trainer.py * resolve bugs * update * update * update * resolve bug * resolve some failing tests * udpate tests * update * resolve tests * add a test * remove typo * add a test for attachement * update * changed to on_train_dataloader * remove __flash_special_attr__ * resolve tests * update * update * update * update on comments * Update pytorch_lightning/trainer/data_loading.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix back-compatibility for Accel (#6655) * Refactor PyTorch profiler 4/5 (#6349) Co-authored-by: thomas chaton <thomas@grid.ai> * Add PyTorch 1.8 Profiler 5/5 (#6618) * Refactor profilers * Update PassThrough * WIP - This is broken and will change * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: thomas chaton <thomas@grid.ai> * resolve tests * resolve tests * find output * try something * update * add support for test and predict * update * update * use getattr * test * test * update * tests * update * update * update * update * update * remove file * update * update * update * update * update * test * update# * update * update tests * update * add suport for 1.8 * rename records * add support for 1.8 * update * resolve flake8 * resolve test * Refactor basic profilers * Fixes * Unused import * Introduce setup * Profile on all ranks. Print to stdout on 0 * Introduce dirpath + filename * CHANGELOG * Add tests. Address comments * add `on_run_stage_setup` * add on_run_stage_setup function * update * add test for RegisterRecordFunction * update lightnng flow direction * move variable to private * remove trace * Undo code that should be in 3/4 * Multi-stage multi-rank * 2/5 changes * Pass stage in __del__ * Remove TODOs * Describe on_evaluation_end. Add tests * Typo * Address comments * deepcopy tests * Advanced teardown * Fix teardown test * Fix tests * Minor change * Update CHANGELOG.md * Fix test * Quick fixes * Fix 6522 * resolve ddp tests * resolve tests * resolve some tests …
- Loading branch information