-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python, core): support mutable bucket tensors #271
Conversation
Create a zero gradient tensor for the current parameter if not exist. | ||
|
||
Returns: | ||
The original tensor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is breaking change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close #287 |
@@ -10,29 +10,54 @@ | |||
@gorilla.patches(torch.Tensor, filter=lambda name, obj: "bagua" in name) | |||
class BaguaTensor: | |||
""" | |||
This class patch `torch.Tensor <https://pytorch.org/docs/stable/tensors.html?highlight=tensor#torch.Tensor>`_ with additional methods. | |||
This class patch `torch.Tensor <https://pytorch.org/docs/stable/tensors.html?highlight=tensor#torch.Tensor>`_ | |||
with additional methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bagua Tensor features a proxy structure, where the actual tensor used by backend is accessed via a "Proxy Tensor".
The proxy tensor is registered in Bagua, whenever the Bagua backend needs a tensor (for example use it for communication), it calls
the `getter_closure` on the proxy tensor to get the tensor that is actually worked on. We call this tensor "Effective Tensor".
Their relation can be seen in the following diagram:
┌───────────────┐
│ Bagua Backend │
└──────▲────────┘
│
access
│
┌────────────────┼────────────────┐
│Bagua Tensor │ │
│ ┌───────┴────────┐ │
│ │ Proxy Tensor │ │
│ └───┬──────▲─────┘ │
│ │ │ │
│ setter_closure getter_closure │
│ │ │ │
│ ┌──────▼──────┴───────┐ │
│ │ Effective Tensor │ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────┘
For example, in the gradient allreduce algorithm, the effective tensor that
needs to be exchanged between machines is the gradient. In this case, we will
register the model parameters as proxy tensor, and register `getter_closure` to
be `lambda proxy_tensor: proxy_tensor.grad`. In this way, even if the gradient
tensor is recreated or changed during runtime, Bagua can still use the correct
tensor for communication, since the `proxy_tensor` serves as the root for
access and is never replaced.
The `setter_closure` is used to replace the effective tensor during runtime. It
is intended to be used to replace the effective tensor with customized
workflow.
getter_closure takes the registered the tensor as input and returns a PyTorch tensor.
setter_closure takes XXXX and returns XXX.
bagua/torch_api/tensor.py
Outdated
""" | ||
Sets the underlying storage using an existing `torch.Storage <https://pytorch.org/docs/stable/storage.html?highlight=storage>`_. | ||
Sets the underlying storage for the tensor returned by :meth:`bagua_getter_closure` with an existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the terminology we defined in class documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar for other methods' doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
BREAKING CHANGE: `BaguaTensor::bagua_ensure_grad` returns the tensor itself now
BREAKING CHANGE:
BaguaTensor::bagua_ensure_grad
returns the tensor itself nowBaguaTensor::bagua_set_storage
is renamed toBaguaTensor::bagua_set_registered_storage