Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: fix pytest errors on higher versions of PyTorch #697

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

woqidaideshi
Copy link
Contributor

No description provided.

tests/internal/torch/common_utils.py Show resolved Hide resolved
tests/internal/torch/common_utils.py Outdated Show resolved Hide resolved
tests/internal/torch/common_utils.py Outdated Show resolved Hide resolved
tests/torch_api/data_parallel/test_c10d_common.py Outdated Show resolved Hide resolved
tests/torch_api/data_parallel/test_c10d_common.py Outdated Show resolved Hide resolved
tests/torch_api/data_parallel/test_c10d_common.py Outdated Show resolved Hide resolved
tests/torch_api/data_parallel/test_c10d_common.py Outdated Show resolved Hide resolved
@woqidaideshi woqidaideshi changed the title [Draft] ci: fix pytest errors on higher versions of PyTorch ci: fix pytest errors on higher versions of PyTorch Apr 14, 2023
@@ -132,7 +132,11 @@

import torch
import torch.cuda
from torch._six import string_classes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder does torch remove this torch._six in higher versions? do you know which version?
It's better for us to remove these dependencies on private modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangraying torch._six has been removed from PyTorch 2.0.0.

@@ -44,6 +48,7 @@
)

import bagua.torch_api.data_parallel.functional as bagua_dist
from bagua.torch_api.contrib.sync_batchnorm import _SYNC_BN_V5, _SYNC_BN_V6, _SYNC_BN_V7

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should couple this test with sync bn together.

Copy link
Contributor Author

@woqidaideshi woqidaideshi Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangraying This approach aims to make the test cases compatible with multiple versions of PyTorch.

Do you have any other suggestions for addressing the compatibility issues of the test cases with multiple versions of PyTorch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree using LooseVersion** as the condition, but it's kind of confusing to import the condition from sync bn.

@@ -29,3 +29,98 @@ jobs:
run: |
rm -rf bagua bagua_core
pytest -s --timeout=300 --timeout_method=thread
build-ten:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why adding so many steps on different containers here, rather than changing above **pytorch-1.9.0** directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangraying This is just to test if it can run properly on different versions of PyTorch. If this PR needs to be merged, the final testing environment will either PyTorch 1.9.0 or PyTorch 1.13.0.
If the test case will to be changed to PyTorch 1.13.0 version later, we can still keep the PyTorch 1.9.0 testing environment here.

Copy link
Member

@wangraying wangraying Apr 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked pytorch lightning's ci, their pytest cpu test supporting multiple torch versions, but gpu test mainly running on latest pytorch release. Maybe we could do similarly, or just maintaining all ci tests on pytorch 1.13?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants