Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: new module wrapping design and algorithm implementation design #24

Merged
merged 238 commits into from
Jul 1, 2021

Conversation

NOBLES5E
Copy link
Contributor

Goal: To support easier implementation of new algorithms like OneBitAdam. Decouple bucketing policy from wrapper.

@NOBLES5E NOBLES5E marked this pull request as draft June 16, 2021 12:11
bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved
bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved
bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved
bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved
bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved
bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved
NOBLES5E and others added 8 commits June 26, 2021 19:05
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@todo
Copy link

todo bot commented Jun 26, 2021

previous buckets and hooks need to be cleared before reinit

# TODO: previous buckets and hooks need to be cleared before reinit
pass
def init_buckets(self, module, optimizer) -> List:
pass


This comment was generated by todo based on a TODO comment in 71a759f in #24. cc @BaguaSys.

@todo
Copy link

todo bot commented Jun 26, 2021

document this

# TODO: document this
if hasattr(module, "_ddp_params_and_buffers_to_ignore"):
parameters_to_ignore = module._ddp_params_and_buffers_to_ignore
else:
parameters_to_ignore = []
module_states = []


This comment was generated by todo based on a TODO comment in 71a759f in #24. cc @BaguaSys.

@todo
Copy link

todo bot commented Jun 26, 2021

broadcast optimizer parameters

# TODO: broadcast optimizer parameters
self.init_algorithm()
def init_algorithm(self):
self.buckets = self.algorithm.init_buckets(self.module, self.optimizer)


This comment was generated by todo based on a TODO comment in 71a759f in #24. cc @BaguaSys.

@NOBLES5E NOBLES5E linked an issue Jun 26, 2021 that may be closed by this pull request
17 tasks

import torch

class BaguaTensor(torch.Tensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blackfmt] reported by reviewdog 🐶

Suggested change
class BaguaTensor(torch.Tensor):
class BaguaTensor(torch.Tensor):

def is_registered(self) -> bool:
return not (self.bagua_backend is None)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blackfmt] reported by reviewdog 🐶

Suggested change

@todo
Copy link

todo bot commented Jul 1, 2021

@shjwudp merge with service module

# TODO: @shjwudp merge with service module
import copy
import collections
import logging


This comment was generated by todo based on a TODO comment in 1cfccfe in #24. cc @BaguaSys.

@todo
Copy link

todo bot commented Jul 1, 2021

@ganshaoduo check if this should be used

# weight_decay = group["weight_decay"] # TODO: @ganshaoduo check if this should be used
beta1, beta2 = group["betas"]
eps = group["eps"]
for param_id, param in enumerate(group["params"]):
state = self.state[param]


This comment was generated by todo based on a TODO comment in 1cfccfe in #24. cc @BaguaSys.

@todo
Copy link

todo bot commented Jul 1, 2021

remove parameter group logic

] # TODO: remove parameter group logic
)
self._bagua_autotune_client.register_models( # TODO: @shjwudp rename to register tensors
autotune_tensor_list, bagua_tensor_group_info
).json() # TODO: @shjwudp error check


This comment was generated by todo based on a TODO comment in 1cfccfe in #24. cc @BaguaSys.

@todo
Copy link

todo bot commented Jul 1, 2021

@shjwudp check whether these are still needed

# score = np.mean(score_list) # TODO: @shjwudp check whether these are still needed
# std = np.std(score_list)
return np.mean(score_list), np.std(score_list), score_list.tolist()


This comment was generated by todo based on a TODO comment in 1cfccfe in #24. cc @BaguaSys.

@todo
Copy link

todo bot commented Jul 1, 2021

@shjwudp merge with service module

# TODO: @shjwudp merge with service module
import copy
import collections
import logging


This comment was generated by todo based on a TODO comment in 96cb6fe in #24. cc @BaguaSys.

@todo
Copy link

todo bot commented Jul 1, 2021

@ganshaoduo check if this should be used

# weight_decay = group["weight_decay"] # TODO: @ganshaoduo check if this should be used
beta1, beta2 = group["betas"]
eps = group["eps"]
for param_id, param in enumerate(group["params"]):
state = self.state[param]


This comment was generated by todo based on a TODO comment in 96cb6fe in #24. cc @BaguaSys.

@todo
Copy link

todo bot commented Jul 1, 2021

remove parameter group logic

] # TODO: remove parameter group logic
)
self._bagua_autotune_client.register_models( # TODO: @shjwudp rename to register tensors
autotune_tensor_list, bagua_tensor_group_info
).json() # TODO: @shjwudp error check


This comment was generated by todo based on a TODO comment in 96cb6fe in #24. cc @BaguaSys.

@todo
Copy link

todo bot commented Jul 1, 2021

@shjwudp check whether these are still needed

# score = np.mean(score_list) # TODO: @shjwudp check whether these are still needed
# std = np.std(score_list)
return np.mean(score_list), np.std(score_list), score_list.tolist()


This comment was generated by todo based on a TODO comment in 96cb6fe in #24. cc @BaguaSys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

python refactor tracking issue
3 participants