Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running apex with error: AttributeError: module 'torch.distributed' has no attribute '_reduce_scatter_base' #1773

Open
cs-wangfeng opened this issue Feb 1, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@cs-wangfeng
Copy link

Describe the Bug
I'm running a program with apex in my anaconda3 environment. But meet with the following error:

...
  File ".../anaconda3/envs/valor/lib/python3.9/site-packages/apex/transformer/pipeline_parallel/schedules/common.py", line 14, in <module>
    from apex.transformer.tensor_parallel.layers import (
  File ".../anaconda3/envs/valor/lib/python3.9/site-packages/apex/transformer/tensor_parallel/__init__.py", line 21, in <module>
    from apex.transformer.tensor_parallel.layers import (
  File ".../anaconda3/envs/valor/lib/python3.9/site-packages/apex/transformer/tensor_parallel/layers.py", line 32, in <module>
    from apex.transformer.tensor_parallel.mappings import (
  File ".../anaconda3/envs/valor/lib/python3.9/site-packages/apex/transformer/tensor_parallel/mappings.py", line 29, in <module>
    torch.distributed.reduce_scatter_tensor = torch.distributed._reduce_scatter_base
AttributeError: module 'torch.distributed' has no attribute '_reduce_scatter_base'

Minimal Steps/Code to Reproduce the Bug
I installed apex with the following steps:

git clone https://github.com/NVIDIA/apex.git
cd apex
pip install -v --disable-pip-version-check --no-build-isolation --no-cache-dir ./

I also tried with the following steps:

git clone https://github.com/NVIDIA/apex.git
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

or

git clone https://github.com/NVIDIA/apex.git
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

But the methods all don't work.

Environment

Here is my environment info:

Python-3.9.12
pip-23.3.1
pytorch-1.9.0
cuda-11.1
I installed my env by pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

@cs-wangfeng cs-wangfeng added the bug Something isn't working label Feb 1, 2024
@cs-wangfeng
Copy link
Author

cs-wangfeng commented Feb 6, 2024

The issue was resolved by rolling back the Python version to 3.7.
The pip version doesn't influence the installing of apex

@AlaaAlmutawa
Copy link

I am having the same issue. How did you fix it? unfortunately nothing is working. Python version is 3.7

@JeremySun1224
Copy link

May I ask why this bug has not been fixed yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants