feat(python, core): support mutable bucket tensors #271

wangraying · 2021-10-09T10:34:25Z

BREAKING CHANGE:

BaguaTensor::bagua_ensure_grad returns the tensor itself now
BaguaTensor::bagua_set_storage is renamed to BaguaTensor::bagua_set_registered_storage

bagua/torch_api/tensor.py

rust/bagua-core/bagua-core-internal/src/datatypes/mod.rs

rust/bagua-core/bagua-core-py/src/lib.rs

NOBLES5E · 2021-10-20T17:41:02Z

bagua/torch_api/tensor.py

+        Create a zero gradient tensor for the current parameter if not exist.
+
+        Returns:
+            The original tensor.


This is breaking change

NOBLES5E

CI failed https://buildkite.com/bagua/bagua-gpu-test/builds/1277#ce50450e-ce7f-4310-959b-bdba87ad166d

wangraying · 2021-10-21T03:27:11Z

close #287

NOBLES5E · 2021-10-27T09:47:32Z

bagua/torch_api/tensor.py

@@ -10,29 +10,54 @@
 @gorilla.patches(torch.Tensor, filter=lambda name, obj: "bagua" in name)
 class BaguaTensor:
    """
-    This class patch `torch.Tensor <https://pytorch.org/docs/stable/tensors.html?highlight=tensor#torch.Tensor>`_ with additional methods.
+    This class patch `torch.Tensor <https://pytorch.org/docs/stable/tensors.html?highlight=tensor#torch.Tensor>`_
+    with additional methods.


Bagua Tensor features a proxy structure, where the actual tensor used by backend is accessed via a "Proxy Tensor". The proxy tensor is registered in Bagua, whenever the Bagua backend needs a tensor (for example use it for communication), it calls the `getter_closure` on the proxy tensor to get the tensor that is actually worked on. We call this tensor "Effective Tensor". Their relation can be seen in the following diagram: ┌───────────────┐ │ Bagua Backend │ └──────▲────────┘ │ access │ ┌────────────────┼────────────────┐ │Bagua Tensor │ │ │ ┌───────┴────────┐ │ │ │ Proxy Tensor │ │ │ └───┬──────▲─────┘ │ │ │ │ │ │ setter_closure getter_closure │ │ │ │ │ │ ┌──────▼──────┴───────┐ │ │ │ Effective Tensor │ │ │ └─────────────────────┘ │ │ │ └─────────────────────────────────┘ For example, in the gradient allreduce algorithm, the effective tensor that needs to be exchanged between machines is the gradient. In this case, we will register the model parameters as proxy tensor, and register `getter_closure` to be `lambda proxy_tensor: proxy_tensor.grad`. In this way, even if the gradient tensor is recreated or changed during runtime, Bagua can still use the correct tensor for communication, since the `proxy_tensor` serves as the root for access and is never replaced. The `setter_closure` is used to replace the effective tensor during runtime. It is intended to be used to replace the effective tensor with customized workflow. getter_closure takes the registered the tensor as input and returns a PyTorch tensor. setter_closure takes XXXX and returns XXX.

bagua/torch_api/tensor.py

NOBLES5E · 2021-10-28T02:31:43Z

bagua/torch_api/tensor.py

        """
-        Sets the underlying storage using an existing `torch.Storage <https://pytorch.org/docs/stable/storage.html?highlight=storage>`_.
+        Sets the underlying storage for the tensor returned by :meth:`bagua_getter_closure` with an existing


use the terminology we defined in class documentation

similar for other methods' doc

NOBLES5E

see comments

bagua/torch_api/tensor.py

…o bucket-tensor

bagua/torch_api/tensor.py

BREAKING CHANGE: `BaguaTensor::bagua_ensure_grad` returns the tensor itself now

wangraying added 4 commits October 8, 2021 17:04

reset tensorpy

0eb6994

refactor

83da89c

make it ok for allreduce

2c53d8f

update

9351840

pr-triage bot added the PR: unreviewed label Oct 9, 2021

wangraying added 3 commits October 9, 2021 18:51

add tests

4957026

use getter closure and setter closure

b771b48

update

cf73707

NOBLES5E reviewed Oct 12, 2021

View reviewed changes

bagua/torch_api/tensor.py Outdated Show resolved Hide resolved

NOBLES5E reviewed Oct 12, 2021

View reviewed changes

bagua/torch_api/tensor.py Outdated Show resolved Hide resolved

NOBLES5E reviewed Oct 12, 2021

View reviewed changes

rust/bagua-core/bagua-core-internal/src/datatypes/mod.rs Outdated Show resolved Hide resolved

NOBLES5E reviewed Oct 12, 2021

View reviewed changes

rust/bagua-core/bagua-core-py/src/lib.rs Outdated Show resolved Hide resolved

wangraying mentioned this pull request Oct 16, 2021

feat(python): support switching between different algorithms #299

Merged

wangraying added 7 commits October 19, 2021 14:22

tmp save

0222d93

Merge branch 'master' into bucket-tensor

a060944

.

b007294

f

e94deca

fix and add

fda2391

fix

de605fc

support qadam

b32eb61

wangraying changed the title ~~fix: refactor bagua tensor~~ fix(python, core): refactor bagua tensor Oct 20, 2021

.

dde2d56

wangraying requested a review from NOBLES5E October 20, 2021 14:29

NOBLES5E reviewed Oct 20, 2021

View reviewed changes

NOBLES5E requested changes Oct 21, 2021

View reviewed changes

pr-triage bot added PR: reviewed-changes-requested and removed PR: unreviewed labels Oct 21, 2021

wangraying mentioned this pull request Oct 21, 2021

feat(python, core): support process group in with_bagua, support hierarchical communication in bytegrad algorithm #300

Merged

wangraying changed the title ~~fix(python, core): refactor bagua tensor~~ fix(python, core): support mutable bucket tensors Oct 21, 2021

wangraying requested a review from NOBLES5E October 27, 2021 09:29

wangraying added 3 commits October 27, 2021 17:33

.

440f832

.

f8c61c8

.

b45c201

NOBLES5E reviewed Oct 27, 2021

View reviewed changes

wangraying and others added 2 commits October 27, 2021 19:23

.

999bd5b

Update tensor.py

3a496dc

github-actions bot reviewed Oct 28, 2021

View reviewed changes

bagua/torch_api/tensor.py Outdated Show resolved Hide resolved

NOBLES5E reviewed Oct 28, 2021

View reviewed changes

NOBLES5E requested changes Oct 28, 2021

View reviewed changes

pr-triage bot added PR: reviewed-changes-requested and removed PR: unreviewed labels Oct 28, 2021

Update tensor.py

785aa73

pr-triage bot added PR: unreviewed and removed PR: reviewed-changes-requested labels Oct 28, 2021

NOBLES5E added 2 commits October 27, 2021 20:01

Update tensor.py

1aae792

Update tensor.py

c03432c

github-actions bot reviewed Oct 28, 2021

View reviewed changes

bagua/torch_api/tensor.py Outdated Show resolved Hide resolved

github-actions bot reviewed Oct 28, 2021

View reviewed changes

bagua/torch_api/tensor.py Outdated Show resolved Hide resolved

wangraying added 3 commits October 28, 2021 12:22

update doc

803b64f

Merge branch 'bucket-tensor' of https://github.com/BaguaSys/bagua int…

c264345

…o bucket-tensor

.

c634b0a

github-actions bot reviewed Oct 28, 2021

View reviewed changes

bagua/torch_api/tensor.py Outdated Show resolved Hide resolved

.

b98f94d

wangraying requested a review from NOBLES5E October 28, 2021 05:18

NOBLES5E merged commit de7df6e into master Oct 28, 2021

NOBLES5E deleted the bucket-tensor branch October 28, 2021 06:18

pr-triage bot added PR: merged and removed PR: unreviewed labels Oct 28, 2021

NOBLES5E pushed a commit that referenced this pull request Oct 28, 2021

feat(python, core): support mutable bucket tensors (#271)

2fe6eeb

BREAKING CHANGE: `BaguaTensor::bagua_ensure_grad` returns the tensor itself now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python, core): support mutable bucket tensors #271

feat(python, core): support mutable bucket tensors #271

wangraying commented Oct 9, 2021 •

edited

Loading

NOBLES5E Oct 20, 2021

NOBLES5E left a comment

wangraying commented Oct 21, 2021

NOBLES5E Oct 27, 2021 •

edited

Loading

NOBLES5E Oct 28, 2021

NOBLES5E Oct 28, 2021

NOBLES5E left a comment

feat(python, core): support mutable bucket tensors #271

feat(python, core): support mutable bucket tensors #271

Conversation

wangraying commented Oct 9, 2021 • edited Loading

NOBLES5E Oct 20, 2021

Choose a reason for hiding this comment

NOBLES5E left a comment

Choose a reason for hiding this comment

wangraying commented Oct 21, 2021

NOBLES5E Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

NOBLES5E Oct 28, 2021

Choose a reason for hiding this comment

NOBLES5E Oct 28, 2021

Choose a reason for hiding this comment

NOBLES5E left a comment

Choose a reason for hiding this comment

wangraying commented Oct 9, 2021 •

edited

Loading

NOBLES5E Oct 27, 2021 •

edited

Loading