Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable prepare_data from correct processes - clarify local vs global rank #2166

Merged
merged 23 commits into from
Jun 13, 2020

Conversation

williamFalcon
Copy link
Contributor

@williamFalcon williamFalcon commented Jun 13, 2020

  • Enables prepare_data only from global rank 0 if requested or from each node's rank 0
  • Clarifies local_rank vs global_rank
  • adds self.is_global_zero for ease of ops only for node=0, local=0

Fixes #1878
Fixes #2163

@pep8speaks
Copy link

pep8speaks commented Jun 13, 2020

Hello @williamFalcon! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-13 02:57:12 UTC

@williamFalcon williamFalcon changed the title Ananthsub patch 1 [WIP] Ananthsub patch 1 Jun 13, 2020
@mergify mergify bot requested a review from a team June 13, 2020 00:33
@williamFalcon williamFalcon changed the title [WIP] Ananthsub patch 1 Finish Ananthsub patch 1 (enable prepare_data from correct processes). clarify local vs global rank Jun 13, 2020
@williamFalcon williamFalcon added this to the 0.8.0 milestone Jun 13, 2020
@williamFalcon williamFalcon added the priority: 0 High priority task label Jun 13, 2020
pytorch_lightning/trainer/training_loop.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/distrib_data_parallel.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team June 13, 2020 02:30
@@ -532,6 +542,7 @@ def get_init_arguments_and_types(cls) -> List[Tuple[str, Tuple, Any]]:
('max_epochs', (<class 'int'>,), 1000),
...
('precision', (<class 'int'>,), 32),
('prepare_data_per_node', (<class 'bool'>,), True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is in the wrong position. it should go after auto_scale_batch, which is not listed here. so the solution is to remove it.
also you removed
doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
which I am fairly confident is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is it here?
if i remove it, it still fails

Copy link
Member

@awaelchli awaelchli Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well maybe I was wrong and it is just the missing +ELLIPSIS +NORMALIZE_WHITESPACE
cloned and doctest runs locally.
so i think its ok now

@mergify mergify bot requested a review from a team June 13, 2020 13:00
@williamFalcon williamFalcon merged commit 5fd01b0 into master Jun 13, 2020
@Borda Borda deleted the ananthsub-patch-1 branch June 13, 2020 16:32
@Borda Borda changed the title Finish Ananthsub patch 1 (enable prepare_data from correct processes). clarify local vs global rank enable prepare_data from correct processes - clarify local vs global rank Jun 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: 0 High priority task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prepare_data called multiple times per node for slurm and elastic training
6 participants