Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): separate algorithm declaration and implementation #246

Merged
merged 51 commits into from
Oct 19, 2021

Conversation

Tengxu-Sun
Copy link
Contributor

refactor(python): support distributed algorithm instance reuse for multi models

@Tengxu-Sun Tengxu-Sun marked this pull request as draft October 6, 2021 14:10
bagua/torch_api/algorithms/async_model_average.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/async_model_average.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/bytegrad.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/bytegrad.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Show resolved Hide resolved
Tengxu-Sun and others added 3 commits October 6, 2021 22:12
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
bagua/torch_api/algorithms/bytegrad.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/bytegrad.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
Tengxu-Sun and others added 6 commits October 6, 2021 22:12
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
Tengxu-Sun and others added 6 commits October 6, 2021 22:13
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…n.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/gradient_allreduce.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/q_adam.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Show resolved Hide resolved
Tengxu-Sun and others added 2 commits October 6, 2021 22:16
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/gradient_allreduce.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/q_adam.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Show resolved Hide resolved
Tengxu-Sun and others added 2 commits October 6, 2021 22:17
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/decentralized.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/gradient_allreduce.py Outdated Show resolved Hide resolved
bagua/torch_api/algorithms/q_adam.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Outdated Show resolved Hide resolved
tests/torch_api/test_multi_models.py Show resolved Hide resolved
@@ -29,17 +29,10 @@ def __init__(
warmup_steps: int = 0,
):
"""
Create an instance of the
Instance implementation of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

suntengxu and others added 3 commits October 17, 2021 11:27
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@@ -8,10 +8,10 @@
from typing import List


class ByteGradAlgorithm(Algorithm):
class ByteGradAlgorithm_Implementation(Algorithm):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add underscore _. Similar for other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -21,15 +22,15 @@ class _AsyncInternalState(IntEnum):
ABORT = 1


class AsyncModelAverageAlgorithm(Algorithm):
class AsyncModelAverageAlgorithmImplementation(Algorithm):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep the inheritance relationship between **Algorithm and Algorithm. The Algorithm class has a method reify, and different **Algorithm overwrites it.

Add another base class AlgorithmImpl for all implementations of algorithms, and its derived classes **AlgorithmImpl overwrites init_tensors, init_forward_pre_hook ..., etc.

This seems more natural for our current with_bagua method, which accept an Algorithm as its second argument.

Copy link
Member

@wangraying wangraying Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And AsyncModelAverageAlgorithmImplementation seems too long, better named as AsyncModelAverageAlgorithmImpl, it is in accord with Java naming notation. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


class AlgorithmImpl:
"""
This is the base class that all Bagua algorithms implementation inherit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess all Bagua algorithm implementations ? @NOBLES5E

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

and return a corresponding algorithm instance.
"""

def reify(self):
Copy link
Member

@wangraying wangraying Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal:
remove It provides a method that accept the specified distributed algorithm parameters and return a corresponding algorithm instance.

and just add doc for reify: "Create an algorithm instance" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@NOBLES5E NOBLES5E changed the title refactor(python): support distributed algorithm instance reuse for multi models feat(python): separate algorithm declaration and implementation Oct 19, 2021
@NOBLES5E NOBLES5E merged commit 6be1399 into master Oct 19, 2021
@NOBLES5E NOBLES5E deleted the algorithm_check branch October 19, 2021 08:32
@pr-triage pr-triage bot added the PR: merged label Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check if an algorithm instance has been used for a module, when passed to another module.
3 participants