Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call set_epoch on Dataloader.batch_sampler if defined #3124

Merged
merged 31 commits into from
Apr 2, 2024

Conversation

Ghelfi
Copy link
Contributor

@Ghelfi Ghelfi commented Mar 18, 2024

What does this PR do?

Proposes a technical implementation to set_epoch on either DataLoader.sampler or DataLoader.batch_sampler if it is defined.

What issue(s) does this change relate to?

@mvpatel2000
Copy link
Contributor

mvpatel2000 commented Mar 19, 2024

Thanks for the PR!

Would you mind adding a unit test too please? You can extend test_checkpoint.py with class TestCheckpointResumption:. I would add an arg to get_trainer to add a batch dist sampler and add a new fn that tests that path instead.

If you need help, I'm happy to provide assistance for this!

When you are ready, please request review from mvpatel2000!

Ghelfi and others added 11 commits March 21, 2024 17:43
Bumps [coverage[toml]](https://github.com/nedbat/coveragepy) from 7.4.3 to 7.4.4.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](nedbat/coveragepy@7.4.3...7.4.4)

---
updated-dependencies:
- dependency-name: coverage[toml]
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: practicinginhell <oneforall1412@gmail.com>
…osaicml#3122)

Updates the requirements on [packaging](https://github.com/pypa/packaging) to permit the latest version.
- [Release notes](https://github.com/pypa/packaging/releases)
- [Changelog](https://github.com/pypa/packaging/blob/main/CHANGELOG.rst)
- [Commits](pypa/packaging@21.3...24.0)

---
updated-dependencies:
- dependency-name: packaging
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
@Ghelfi Ghelfi force-pushed the set-epoch-on-batch-sampler branch from 660d7fd to 0c42c51 Compare March 21, 2024 16:43
@Ghelfi Ghelfi requested review from a team as code owners March 21, 2024 16:43
mbway and others added 5 commits March 25, 2024 11:21
* fix documentation about which file extensions perform compression

* refactor tar handling when writing checkpoint files

* add CLI compressor options to checkpoint saving

* update documentation

* restructure CLI compressor handling. Added more tests

* fix skipping test when compressor not installed

* add documentation and fix some code style issues

* use correct file extension for zstd

* add missing full stop

* fix failing tests

* remove type ignore

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* combine imports into one

* fix capitalisation

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

---------

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
* filter

* remove metadata skip
* bump version to 021

* fix lint

* fix docstring

* remove old ones
@Ghelfi Ghelfi force-pushed the set-epoch-on-batch-sampler branch from e496bd0 to c731f67 Compare March 25, 2024 10:21
@Ghelfi Ghelfi requested a review from antoinebrl March 25, 2024 12:06
@Ghelfi
Copy link
Contributor Author

Ghelfi commented Mar 26, 2024

@mvpatel2000 cannot add you as reviewer to ask for a review.
I added a test and ready for a review.
I put the test where you suggested. Will it be better to put it in test_trainer.py?

@Ghelfi Ghelfi force-pushed the set-epoch-on-batch-sampler branch from c731f67 to 68e6d9d Compare March 27, 2024 03:36
@Ghelfi Ghelfi marked this pull request as draft March 27, 2024 14:33
@Ghelfi Ghelfi marked this pull request as ready for review March 27, 2024 15:16
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming tests pass! I have some minor style nits

tests/trainer/test_checkpoint.py Outdated Show resolved Hide resolved
tests/trainer/test_checkpoint.py Outdated Show resolved Hide resolved
tests/trainer/test_checkpoint.py Show resolved Hide resolved
@Ghelfi Ghelfi requested a review from mvpatel2000 March 29, 2024 08:25
@mvpatel2000
Copy link
Contributor

@Ghelfi do you mind if I push directly to this branch? there's a few lint things and refactor in test I would prefer. I'm happy to leave as comments if you'd like, but I figure it's faster for me to directly apply. I don't want to delay you further :)

@Ghelfi
Copy link
Contributor Author

Ghelfi commented Apr 1, 2024

@mvpatel2000 fine by me. Thanks for handling it.

@mvpatel2000 mvpatel2000 merged commit a9b9791 into mosaicml:dev Apr 2, 2024
14 checks passed
j316chuck pushed a commit that referenced this pull request May 16, 2024
* adding _get_dist_sampler function

* apply _get_dist_sampler across all trainer.py

* Switch condition to check if batch_sampler if filled first since sampler is always defined

* adding docstring

* removing antipattenr from _get_sampler and linting

* removed default attrs from exception in the attrs dict (#3126)

* Bump coverage[toml] from 7.4.3 to 7.4.4 (#3121)

Bumps [coverage[toml]](https://github.com/nedbat/coveragepy) from 7.4.3 to 7.4.4.
- [Release notes](https://github.com/nedbat/coveragepy/releases)
- [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst)
- [Commits](nedbat/coveragepy@7.4.3...7.4.4)

---
updated-dependencies:
- dependency-name: coverage[toml]
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Refactor initialization (#3127)

Co-authored-by: practicinginhell <oneforall1412@gmail.com>

* Bump databricks sdk version (#3128)

* Update packaging requirement from <23.3,>=21.3.0 to >=21.3.0,<24.1 (#3122)

Updates the requirements on [packaging](https://github.com/pypa/packaging) to permit the latest version.
- [Release notes](https://github.com/pypa/packaging/releases)
- [Changelog](https://github.com/pypa/packaging/blob/main/CHANGELOG.rst)
- [Commits](pypa/packaging@21.3...24.0)

---
updated-dependencies:
- dependency-name: packaging
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* remove rng from save weights only ckpt (#3129)

* More compression options (#3118)

* fix documentation about which file extensions perform compression

* refactor tar handling when writing checkpoint files

* add CLI compressor options to checkpoint saving

* update documentation

* restructure CLI compressor handling. Added more tests

* fix skipping test when compressor not installed

* add documentation and fix some code style issues

* use correct file extension for zstd

* add missing full stop

* fix failing tests

* remove type ignore

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* combine imports into one

* fix capitalisation

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

---------

Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>

* Only broadcast distcp files (#3130)

* filter

* remove metadata skip

* Bump version to 0.21 (#3132)

* bump version to 021

* fix lint

* fix docstring

* remove old ones

* Adding a test for checking the update of epoch on batch_sampler

* proper formating

* adding _get_dist_sampler function

* apply _get_dist_sampler across all trainer.py

* Switch condition to check if batch_sampler if filled first since sampler is always defined

* adding docstring

* removing antipattenr from _get_sampler and linting

* Adding a test for checking the update of epoch on batch_sampler

* proper formating

* Chang _get_sampler to _get_distributed_sampler to bypass torch default batch_sampler

* Move batch_sampler instanciation for test

* fix batch_sampler setting in eval_loop

* trainer

* fix

* lint

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jane Zhang <jane.zhang@databricks.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Practicinginhell <147235329+Practicinginhell@users.noreply.github.com>
Co-authored-by: practicinginhell <oneforall1412@gmail.com>
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: Evan Racah <evan@mosaicml.com>
Co-authored-by: Matt Broadway <mattdbway@gmail.com>
Co-authored-by: Alexandre Ghelfi <alexandre.ghelfi@helsing.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants