refactor: new module wrapping design and algorithm implementation design #24

NOBLES5E · 2021-06-16T12:11:28Z

Goal: To support easier implementation of new algorithms like OneBitAdam. Decouple bucketing policy from wrapper.

bagua/torch_api/distributed_dev.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

todo · 2021-06-26T11:05:25Z

previous buckets and hooks need to be cleared before reinit

bagua/bagua/torch_api/distributed_dev.py

Lines 13 to 18 in 71a759f

    
               # TODO: previous buckets and hooks need to be cleared before reinit 
        
               pass 
        
           def init_buckets(self, module, optimizer) -> List: 
        
               pass

This comment was generated by todo based on a `TODO` comment in `71a759f` in #24. cc @BaguaSys.

todo · 2021-06-26T11:05:28Z

document this

bagua/bagua/torch_api/distributed_dev.py

Lines 34 to 39 in 71a759f

    
           # TODO: document this 
        
           if hasattr(module, "_ddp_params_and_buffers_to_ignore"): 
        
               parameters_to_ignore = module._ddp_params_and_buffers_to_ignore 
        
           else: 
        
               parameters_to_ignore = [] 
        
           module_states = []

This comment was generated by todo based on a `TODO` comment in `71a759f` in #24. cc @BaguaSys.

todo · 2021-06-26T11:05:31Z

broadcast optimizer parameters

bagua/bagua/torch_api/distributed_dev.py

Lines 61 to 66 in 71a759f

    
               # TODO: broadcast optimizer parameters 
        
               self.init_algorithm() 
        
           def init_algorithm(self): 
        
               self.buckets = self.algorithm.init_buckets(self.module, self.optimizer)

This comment was generated by todo based on a `TODO` comment in `71a759f` in #24. cc @BaguaSys.

github-actions · 2021-06-26T11:52:57Z

bagua/torch_api/dev/tensor.py

+
+import torch
+
+class BaguaTensor(torch.Tensor):


[blackfmt] _{reported by reviewdog 🐶}

Suggested change

class BaguaTensor(torch.Tensor):

class BaguaTensor(torch.Tensor):

github-actions · 2021-06-26T11:59:46Z

bagua/torch_api/dev/tensor.py

+    def is_registered(self) -> bool:
+        return not (self.bagua_backend is None)
+
+


[blackfmt] _{reported by reviewdog 🐶}

Suggested change

todo · 2021-07-01T04:59:11Z

@shjwudp merge with service module

bagua/bagua/autotune/__init__.py

Lines 1 to 4 in 1cfccfe

    
           # TODO: @shjwudp merge with service module 
        
           import copy 
        
           import collections 
        
           import logging

This comment was generated by todo based on a `TODO` comment in `1cfccfe` in #24. cc @BaguaSys.

todo · 2021-07-01T04:59:14Z

@ganshaoduo check if this should be used

bagua/bagua/torch_api/algorithms/onebit_adam.py

Lines 192 to 197 in 1cfccfe

    
           # weight_decay = group["weight_decay"] # TODO: @ganshaoduo check if this should be used 
        
           beta1, beta2 = group["betas"] 
        
           eps = group["eps"] 
        
           for param_id, param in enumerate(group["params"]): 
        
               state = self.state[param]

This comment was generated by todo based on a `TODO` comment in `1cfccfe` in #24. cc @BaguaSys.

todo · 2021-07-01T04:59:17Z

remove parameter group logic

bagua/bagua/torch_api/distributed.py

Lines 276 to 281 in 1cfccfe

    
               ]  # TODO: remove parameter group logic 
        
           ) 
        
           self._bagua_autotune_client.register_models(  # TODO: @shjwudp rename to register tensors 
        
               autotune_tensor_list, bagua_tensor_group_info 
        
           ).json()  # TODO: @shjwudp error check

This comment was generated by todo based on a `TODO` comment in `1cfccfe` in #24. cc @BaguaSys.

todo · 2021-07-01T04:59:20Z

@shjwudp check whether these are still needed

bagua/bagua/torch_api/utils.py

Lines 261 to 264 in 1cfccfe

    
           # score = np.mean(score_list) # TODO: @shjwudp check whether these are still needed 
        
           # std = np.std(score_list) 
        
           return np.mean(score_list), np.std(score_list), score_list.tolist()

This comment was generated by todo based on a `TODO` comment in `1cfccfe` in #24. cc @BaguaSys.

todo · 2021-07-01T05:06:41Z

@shjwudp merge with service module

bagua/bagua/autotune/__init__.py

Lines 1 to 4 in 96cb6fe

    
           # TODO: @shjwudp merge with service module 
        
           import copy 
        
           import collections 
        
           import logging

This comment was generated by todo based on a `TODO` comment in `96cb6fe` in #24. cc @BaguaSys.

todo · 2021-07-01T05:06:44Z

@ganshaoduo check if this should be used

bagua/bagua/torch_api/algorithms/onebit_adam.py

Lines 192 to 197 in 96cb6fe

    
           # weight_decay = group["weight_decay"] # TODO: @ganshaoduo check if this should be used 
        
           beta1, beta2 = group["betas"] 
        
           eps = group["eps"] 
        
           for param_id, param in enumerate(group["params"]): 
        
               state = self.state[param]

This comment was generated by todo based on a `TODO` comment in `96cb6fe` in #24. cc @BaguaSys.

todo · 2021-07-01T05:06:47Z

remove parameter group logic

bagua/bagua/torch_api/distributed.py

Lines 276 to 281 in 96cb6fe

    
               ]  # TODO: remove parameter group logic 
        
           ) 
        
           self._bagua_autotune_client.register_models(  # TODO: @shjwudp rename to register tensors 
        
               autotune_tensor_list, bagua_tensor_group_info 
        
           ).json()  # TODO: @shjwudp error check

This comment was generated by todo based on a `TODO` comment in `96cb6fe` in #24. cc @BaguaSys.

todo · 2021-07-01T05:06:50Z

@shjwudp check whether these are still needed

bagua/bagua/torch_api/utils.py

Lines 261 to 264 in 96cb6fe

    
           # score = np.mean(score_list) # TODO: @shjwudp check whether these are still needed 
        
           # std = np.std(score_list) 
        
           return np.mean(score_list), np.std(score_list), score_list.tolist()

This comment was generated by todo based on a `TODO` comment in `96cb6fe` in #24. cc @BaguaSys.

NOBLES5E marked this pull request as draft June 16, 2021 12:11

github-actions bot reviewed Jun 16, 2021

View reviewed changes

bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved

bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved

bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved

bagua/torch_api/distributed_dev.py Outdated Show resolved Hide resolved

github-actions bot reviewed Jun 16, 2021

View reviewed changes