Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): support moe #208

Merged
merged 31 commits into from
Sep 28, 2021
Merged

feat(python): support moe #208

merged 31 commits into from
Sep 28, 2021

Conversation

liuhatry
Copy link
Member

No description provided.

bagua/torch_api/moe/moe_layer.py Outdated Show resolved Hide resolved
bagua/torch_api/moe/moe_layer.py Outdated Show resolved Hide resolved
bagua/torch_api/moe/moe_layer.py Outdated Show resolved Hide resolved
bagua/torch_api/moe/moe_layer.py Outdated Show resolved Hide resolved
bagua/torch_api/moe/moe_layer.py Outdated Show resolved Hide resolved
bagua/torch_api/moe/top2gate.py Outdated Show resolved Hide resolved
bagua/torch_api/moe/top2gate.py Outdated Show resolved Hide resolved
examples/moe/main.py Outdated Show resolved Hide resolved
examples/moe/main.py Outdated Show resolved Hide resolved
examples/moe/main.py Outdated Show resolved Hide resolved
liuhatry and others added 6 commits September 18, 2021 09:09
liuhatry and others added 2 commits September 18, 2021 09:25
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
examples/mnist/main.py Outdated Show resolved Hide resolved
examples/mnist/main.py Outdated Show resolved Hide resolved
liuhatry and others added 2 commits September 18, 2021 10:02
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
examples/mnist/main.py Outdated Show resolved Hide resolved
@liuhatry liuhatry marked this pull request as draft September 18, 2021 03:33
@todo
Copy link

todo bot commented Sep 23, 2021

revisit allreduce for moe.gate...

# TODO: revisit allreduce for moe.gate...
for expert in self.deepspeed_experts:
# TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group)
for name, param in expert.named_parameters():
param.allreduce = False


This comment was generated by todo based on a TODO comment in 28bc3e2 in #208. cc @BaguaSys.

@liuhatry liuhatry marked this pull request as ready for review September 24, 2021 01:30
@@ -13,14 +13,20 @@


class Net(nn.Module):
def __init__(self):
def __init__(self, num_local_experts):
Copy link
Contributor

@NOBLES5E NOBLES5E Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move moe example into separate dir, for example examples/mnist-moe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Git commit hash: bff6126f0ddbd1a03da66867571ac87b11c21ac1
# We retain the following license from the original files:

# Copyright 2020 The Microsoft DeepSpeed Team
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add our license line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Git commit hash: bff6126f0ddbd1a03da66867571ac87b11c21ac1
# We retain the following license from the original files:

# Copyright 2020 The Microsoft DeepSpeed Team
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add our license

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Git commit hash: bff6126f0ddbd1a03da66867571ac87b11c21ac1
# We retain the following license from the original files:

# Copyright 2020 The Microsoft DeepSpeed Team
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add our license

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1 @@
from .layer import MoE # noqa: F401
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the whole moe directory to bagua/torch_api/model_parallel/moe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@NOBLES5E NOBLES5E left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@todo
Copy link

todo bot commented Sep 27, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

# TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group)
for name, param in expert.named_parameters():
param.allreduce = False
def forward(self, inputs):
chunks = inputs.chunk(self.num_local_experts, dim=1)


This comment was generated by todo based on a TODO comment in 04658ce in #208. cc @BaguaSys.

@todo
Copy link

todo bot commented Sep 27, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

# TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group)
for name, param in expert.named_parameters():
param.allreduce = False
def forward(self, inputs):
chunks = inputs.chunk(self.num_local_experts, dim=1)


This comment was generated by todo based on a TODO comment in 0fc5ad1 in #208. cc @BaguaSys.

@todo
Copy link

todo bot commented Sep 27, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

# TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group)
for name, param in expert.named_parameters():
param.allreduce = False
def forward(self, inputs):
chunks = inputs.chunk(self.num_local_experts, dim=1)


This comment was generated by todo based on a TODO comment in e50fdfc in #208. cc @BaguaSys.

@todo
Copy link

todo bot commented Sep 28, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

# TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group)
for name, param in expert.named_parameters():
param.allreduce = False
def forward(self, inputs):
chunks = inputs.chunk(self.num_local_experts, dim=1)


This comment was generated by todo based on a TODO comment in 013a3c3 in #208. cc @BaguaSys.

1 similar comment
@todo
Copy link

todo bot commented Sep 28, 2021

Create param groups to handle expert + data case (e.g. param.group = moe_group)

# TODO: Create param groups to handle expert + data case (e.g. param.group = moe_group)
for name, param in expert.named_parameters():
param.allreduce = False
def forward(self, inputs):
chunks = inputs.chunk(self.num_local_experts, dim=1)


This comment was generated by todo based on a TODO comment in 013a3c3 in #208. cc @BaguaSys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants