Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Dockerfile - pin deepspeed to 0.11.2 #821

Closed
wants to merge 0 commits into from

Conversation

fpreiss
Copy link
Contributor

@fpreiss fpreiss commented Nov 4, 2023

Description

Dockerfile-base: pin deepspeed to 0.11.2

Motivation and Context

Deepspeed update to 0.12.x breaks building docker of image.

Since today trying to build the docker base image with:

git clone https://github.com/OpenAccess-AI-Collective/axolotl.git
cd axolotl/docker
docker buildx build -f Dockerfile-base --target base-builder -t base-builder --progress=plain .
docker buildx build -f Dockerfile-base --target deepspeed-builder -t deepspeed-builder --progress=plain .

leads to the following error:

 > [deepspeed-builder 2/2] RUN git clone https://github.com/microsoft/DeepSpeed.git &&     cd DeepSpeed &&     MAX_CONCURRENCY=8 DS_BUILD_SPARSE_ATTN=0 DS_BUILD_OPS=1 DS_BUILD_EVOFORMER_ATTN=0 python3 setup.py bdist_wheel:
32.02  [WARNING]  cpu_lion attempted to use `py-cpuinfo` but failed (exception type: <class 'UnboundLocalError'>, local variable 'get_cpu_info' referenced before assignment), falling back to `lscpu` to get this information.
32.02  [WARNING]  cpu_lion attempted to use `py-cpuinfo` but failed (exception type: <class 'UnboundLocalError'>, local variable 'get_cpu_info' referenced before assignment), falling back to `lscpu` to get this information.
32.02  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
32.02  [WARNING]  Filtered compute capabilities ['7.0', '7.5']
32.02     ext_modules.append(builder.builder())
32.02   File "/workspace/DeepSpeed/op_builder/builder.py", line 633, in builder
32.02     extra_link_args=self.strip_empty_entries(self.extra_ldflags()))
32.02   File "/workspace/DeepSpeed/op_builder/inference_cutlass_builder.py", line 71, in extra_ldflags
32.02     import dskernels
32.02 ModuleNotFoundError: No module named 'dskernels'

How has this been tested?

The changes have been tested manually on my local machine with the two docker build commands above.

@casper-hansen
Copy link
Collaborator

casper-hansen commented Nov 4, 2023

This PR should not be needed.

pip install deepspeed-kernels

https://github.com/microsoft/DeepSpeed-Kernels

@fpreiss
Copy link
Contributor Author

fpreiss commented Nov 4, 2023

Thank you, I can verify, that with the following patch I can indeed build both - the Dockerfile-base, as well as the Dockerfile (from the local base image) on the deepspeed v0.12.2 tag.

 RUN python3 -m pip install --upgrade pip && pip3 install packaging && \
-    python3 -m pip install --no-cache-dir -U torch==${PYTORCH_VERSION}+cu${CUDA} --extra-index-url https://download.pytorch.org/whl/cu$CUDA
+    python3 -m pip install --no-cache-dir -U torch==${PYTORCH_VERSION}+cu${CUDA} deepspeed-kernels --extra-index-url https://download.pytorch.org/whl/cu$CUDA

@winglian
Copy link
Collaborator

winglian commented Nov 5, 2023

Thank you, I can verify, that with the following patch I can indeed build both - the Dockerfile-base, as well as the Dockerfile (from the local base image) on the deepspeed v0.12.2 tag.

 RUN python3 -m pip install --upgrade pip && pip3 install packaging && \
-    python3 -m pip install --no-cache-dir -U torch==${PYTORCH_VERSION}+cu${CUDA} --extra-index-url https://download.pytorch.org/whl/cu$CUDA
+    python3 -m pip install --no-cache-dir -U torch==${PYTORCH_VERSION}+cu${CUDA} deepspeed-kernels --extra-index-url https://download.pytorch.org/whl/cu$CUDA

thanks, is it easier to open a new PR for this change or just modify this PR?

@fpreiss
Copy link
Contributor Author

fpreiss commented Nov 5, 2023

I'll open a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants