feat(python): support moe #208

liuhatry · 2021-09-18T01:04:06Z

No description provided.

bagua/torch_api/moe/moe_layer.py

bagua/torch_api/moe/top2gate.py

examples/moe/main.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…atron

bagua/torch_api/distributed.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

examples/mnist/main.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

examples/mnist/main.py

…atron

todo · 2021-09-23T07:33:11Z

revisit allreduce for moe.gate...

bagua/bagua/torch_api/moe/experts.py

Lines 20 to 25 in 28bc3e2

    
           # TODO: revisit allreduce for moe.gate... 
        
           for expert in self.deepspeed_experts: 
        
               # TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group) 
        
               for name, param in expert.named_parameters(): 
        
                   param.allreduce = False

This comment was generated by todo based on a `TODO` comment in `28bc3e2` in #208. cc @BaguaSys.

NOBLES5E · 2021-09-27T08:24:32Z

examples/mnist/main.py

@@ -13,14 +13,20 @@


 class Net(nn.Module):
-    def __init__(self):
+    def __init__(self, num_local_experts):


move moe example into separate dir, for example examples/mnist-moe

NOBLES5E · 2021-09-27T08:25:13Z

bagua/torch_api/moe/sharded_moe.py

+# Git commit hash: bff6126f0ddbd1a03da66867571ac87b11c21ac1
+# We retain the following license from the original files:
+
+# Copyright 2020 The Microsoft DeepSpeed Team


Also add our license line

NOBLES5E · 2021-09-27T08:25:28Z

bagua/torch_api/moe/layer.py

+# Git commit hash: bff6126f0ddbd1a03da66867571ac87b11c21ac1
+# We retain the following license from the original files:
+
+# Copyright 2020 The Microsoft DeepSpeed Team


add our license

NOBLES5E · 2021-09-27T08:25:40Z

bagua/torch_api/moe/experts.py

+# Git commit hash: bff6126f0ddbd1a03da66867571ac87b11c21ac1
+# We retain the following license from the original files:
+
+# Copyright 2020 The Microsoft DeepSpeed Team


add our license

NOBLES5E · 2021-09-27T08:27:34Z

bagua/torch_api/moe/__init__.py

@@ -0,0 +1 @@
+from .layer import MoE  # noqa: F401


move the whole moe directory to bagua/torch_api/model_parallel/moe

NOBLES5E

see comments

todo · 2021-09-27T08:42:31Z

Create param groups to handle expert + data case (e.g. param.group = moe_group)

bagua/bagua/torch_api/model_parallel/moe/experts.py

Lines 27 to 32 in 04658ce

    
                   # TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group) 
        
                   for name, param in expert.named_parameters(): 
        
                       param.allreduce = False 
        
           def forward(self, inputs): 
        
               chunks = inputs.chunk(self.num_local_experts, dim=1)

This comment was generated by todo based on a `TODO` comment in `04658ce` in #208. cc @BaguaSys.

bagua/torch_api/model_parallel/moe/layer.py

bagua/torch_api/model_parallel/moe/sharded_moe.py

todo · 2021-09-27T08:54:20Z

Create param groups to handle expert + data case (e.g. param.group = moe_group)

bagua/bagua/torch_api/model_parallel/moe/experts.py

Lines 27 to 32 in 0fc5ad1

    
                   # TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group) 
        
                   for name, param in expert.named_parameters(): 
        
                       param.allreduce = False 
        
           def forward(self, inputs): 
        
               chunks = inputs.chunk(self.num_local_experts, dim=1)

This comment was generated by todo based on a `TODO` comment in `0fc5ad1` in #208. cc @BaguaSys.

todo · 2021-09-27T09:15:28Z

Create param groups to handle expert + data case (e.g. param.group = moe_group)

bagua/bagua/torch_api/model_parallel/moe/experts.py

Lines 27 to 32 in e50fdfc

    
                   # TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group) 
        
                   for name, param in expert.named_parameters(): 
        
                       param.allreduce = False 
        
           def forward(self, inputs): 
        
               chunks = inputs.chunk(self.num_local_experts, dim=1)

This comment was generated by todo based on a `TODO` comment in `e50fdfc` in #208. cc @BaguaSys.

todo · 2021-09-28T01:27:03Z

Create param groups to handle expert + data case (e.g. param.group = moe_group)

bagua/bagua/torch_api/model_parallel/moe/experts.py

Lines 27 to 32 in 013a3c3

    
                   # TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group) 
        
                   for name, param in expert.named_parameters(): 
        
                       param.allreduce = False 
        
           def forward(self, inputs): 
        
               chunks = inputs.chunk(self.num_local_experts, dim=1)

This comment was generated by todo based on a `TODO` comment in `013a3c3` in #208. cc @BaguaSys.

todo · 2021-09-28T01:27:06Z

Create param groups to handle expert + data case (e.g. param.group = moe_group)

bagua/bagua/torch_api/model_parallel/moe/experts.py

Lines 27 to 32 in 013a3c3

    
                   # TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group) 
        
                   for name, param in expert.named_parameters(): 
        
                       param.allreduce = False 
        
           def forward(self, inputs): 
        
               chunks = inputs.chunk(self.num_local_experts, dim=1)

This comment was generated by todo based on a `TODO` comment in `013a3c3` in #208. cc @BaguaSys.

liuhatry added 2 commits September 17, 2021 18:27

feat(python): support moe

db9963f

update

d3651e1

pr-triage bot added the PR: unreviewed label Sep 18, 2021

github-actions bot reviewed Sep 18, 2021

View reviewed changes

liuhatry and others added 6 commits September 18, 2021 09:09

fix code stype

47dc14b

Update examples/moe/main.py

258ff6f

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

update

ce8ffe6

Merge branch 'megatron' of https://github.com/BaguaSys/bagua into meg…

755c674

…atron

update

076e8a6

update

d62347a

github-actions bot reviewed Sep 18, 2021

View reviewed changes

bagua/torch_api/distributed.py Outdated Show resolved Hide resolved

update

ebc9b30

github-actions bot reviewed Sep 18, 2021

View reviewed changes

bagua/torch_api/distributed.py Outdated Show resolved Hide resolved

liuhatry and others added 2 commits September 18, 2021 09:25

Update bagua/torch_api/distributed.py

f65d8af

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

update

ab7770e

github-actions bot reviewed Sep 18, 2021

View reviewed changes

examples/mnist/main.py Outdated Show resolved Hide resolved

examples/mnist/main.py Outdated Show resolved Hide resolved

liuhatry and others added 2 commits September 18, 2021 10:02

Update examples/mnist/main.py

acc26c9

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update examples/mnist/main.py

6fda627

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions bot reviewed Sep 18, 2021

View reviewed changes

examples/mnist/main.py Outdated Show resolved Hide resolved

liuhatry requested a review from NOBLES5E September 18, 2021 02:50

liuhatry added 2 commits September 18, 2021 11:31

add moe test

7867e5d

Merge branch 'megatron' of https://github.com/BaguaSys/bagua into meg…

6a84c5e

…atron

liuhatry marked this pull request as draft September 18, 2021 03:33

udpate

2a6cb52

pr-triage bot added PR: draft and removed PR: unreviewed labels Sep 18, 2021

liuhatry added 3 commits September 23, 2021 11:33

update

8379d5a

update

8160021

deepspeed moe

28bc3e2

liuhatry marked this pull request as ready for review September 24, 2021 01:30

pr-triage bot added PR: unreviewed and removed PR: draft labels Sep 24, 2021

NOBLES5E reviewed Sep 27, 2021

View reviewed changes

NOBLES5E requested changes Sep 27, 2021

View reviewed changes

pr-triage bot added PR: reviewed-changes-requested and removed PR: unreviewed labels Sep 27, 2021

update

04658ce

pr-triage bot removed the PR: reviewed-changes-requested label Sep 27, 2021

pr-triage bot added the PR: unreviewed label Sep 27, 2021

github-actions bot reviewed Sep 27, 2021

View reviewed changes

update

0fc5ad1

update

e50fdfc

liuhatry requested a review from NOBLES5E September 27, 2021 10:35

Update requirements.txt

013a3c3

NOBLES5E merged commit ac6018d into master Sep 28, 2021

pr-triage bot removed the PR: unreviewed label Sep 28, 2021

NOBLES5E deleted the megatron branch September 28, 2021 01:27

pr-triage bot added the PR: merged label Sep 28, 2021

This was referenced Sep 28, 2021

revisit allreduce for moe.gate... #232

Open

Create param groups to handle expert + data case (e.g. param.group = moe_group) #233

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): support moe #208

feat(python): support moe #208

liuhatry commented Sep 18, 2021

todo bot commented Sep 23, 2021

NOBLES5E Sep 27, 2021 •

edited

Loading

liuhatry Sep 27, 2021

NOBLES5E Sep 27, 2021

liuhatry Sep 27, 2021

NOBLES5E Sep 27, 2021

liuhatry Sep 27, 2021

NOBLES5E Sep 27, 2021

liuhatry Sep 27, 2021

NOBLES5E Sep 27, 2021

liuhatry Sep 27, 2021

NOBLES5E left a comment

todo bot commented Sep 27, 2021

todo bot commented Sep 27, 2021

todo bot commented Sep 27, 2021

todo bot commented Sep 28, 2021

todo bot commented Sep 28, 2021

feat(python): support moe #208

feat(python): support moe #208

Conversation

liuhatry commented Sep 18, 2021

todo bot commented Sep 23, 2021

revisit allreduce for moe.gate...

This comment was generated by todo based on a TODO comment in 28bc3e2 in #208. cc @BaguaSys.

NOBLES5E Sep 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NOBLES5E left a comment

Choose a reason for hiding this comment

todo bot commented Sep 27, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

This comment was generated by todo based on a TODO comment in 04658ce in #208. cc @BaguaSys.

todo bot commented Sep 27, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

This comment was generated by todo based on a TODO comment in 0fc5ad1 in #208. cc @BaguaSys.

todo bot commented Sep 27, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

This comment was generated by todo based on a TODO comment in e50fdfc in #208. cc @BaguaSys.

todo bot commented Sep 28, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

This comment was generated by todo based on a TODO comment in 013a3c3 in #208. cc @BaguaSys.

todo bot commented Sep 28, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

This comment was generated by todo based on a TODO comment in 013a3c3 in #208. cc @BaguaSys.

This comment was generated by todo based on a `TODO` comment in `28bc3e2` in #208. cc @BaguaSys.

NOBLES5E Sep 27, 2021 •

edited

Loading

This comment was generated by todo based on a `TODO` comment in `04658ce` in #208. cc @BaguaSys.

This comment was generated by todo based on a `TODO` comment in `0fc5ad1` in #208. cc @BaguaSys.

This comment was generated by todo based on a `TODO` comment in `e50fdfc` in #208. cc @BaguaSys.

This comment was generated by todo based on a `TODO` comment in `013a3c3` in #208. cc @BaguaSys.

This comment was generated by todo based on a `TODO` comment in `013a3c3` in #208. cc @BaguaSys.