Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): associate PyTorch Process Group with Bagua Process Group using cache #402

Merged
merged 9 commits into from
Dec 1, 2021

Conversation

shjwudp
Copy link
Member

@shjwudp shjwudp commented Nov 29, 2021

No description provided.

@@ -49,6 +49,8 @@
# Process group count for default naming
_group_count = 0

# Torch process group to bagua process group
_torch_to_bagua_pg_map = {}
Copy link
Contributor

@NOBLES5E NOBLES5E Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the process group is destroyed? It seems that they will never be released in current implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TorchProcessGroup destory has not be handle yet, I have no good way deal with it.

I try to patch bagua pg on TorchProcessGroup, but if patch class is NCCLProcessGroup, it not work. because C object (NCCLProcessGroup) not support add attribute. The final plan is as it is now. Any suggestions ?

Copy link
Contributor

@NOBLES5E NOBLES5E left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

tests/torch_api/test_process_group.py Outdated Show resolved Hide resolved
tests/torch_api/test_process_group.py Outdated Show resolved Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@NOBLES5E NOBLES5E changed the title feat(python): add PyTorch Process Group to Bagua Process Group cache feat(python): associate PyTorch Process Group with Bagua Process Group using cache Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants