Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Dockerfile including DCNv2 GPU compilation #176

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add a Dockerfile including DCNv2 GPU compilation #176

wants to merge 2 commits into from

Conversation

Keiku
Copy link

@Keiku Keiku commented Dec 23, 2020

Everyone seems to be having trouble with issue about GPU compilation of DCNv2, so I added a Dockerfile that works correctly.

I have confirmed the operation in the following environment.

> ~ cat /etc/os-release | grep PRETTY_NAME
PRETTY_NAME="Ubuntu 18.04.2 LTS"> ~ docker --version
Docker version 19.03.5, build 633a0ea838
⋊> ~ docker-compose -v
docker-compose version 1.25.3, build unknown
⋊> ~ docker info | grep -i runtime
WARNING: No swap limit support
 Runtimes: nvidia runc
 Default Runtime: nvidia
⋊> ~ 

If you want to use cuda when building docker container, you need to set the following daemon.json.

⋊> ~ cat /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

After preparing the above environment, execute Docker build with the following command.

docker-compose up -d dev

You cannot build with cuda unless you add the following to docker-compose.yaml.

    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all

For Docker 19.03 and later versions, use the --gpus all option. Execute the docker run command as follows.

docker run --gpus all --ipc=host --rm -it \
         -v /home/keiichi.kuroyanagi/datasets/:/CenterTrack/data/ \
         -v /home/keiichi.kuroyanagi/pretrained_models/:/CenterTrack/models/ \
         centertrack_dev

@ahyunlee
Copy link

Thanks for your dockerfile!!!

I had some problem about "qt.qpa.xcb: could not connect to display"

Can you help me?


root@a2118cd70918:/CenterTrack/src# python demo.py tracking,ddd --load_model ../models/nuScenes_3Dtracking.pth --dataset nuscenes --pre_hm --track_thresh 0.1 --demo ../videos/nuscenes_mini.mp4 --test_focal_length 633
/usr/local/lib/python3.6/dist-packages/sklearn/utils/linear_assignment_.py:22: FutureWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead.
FutureWarning)
Running tracking
Using tracking threshold for out threshold! 0.1
Fix size testing.
training chunk_sizes: [32]
input h w: 448 800
heads {'hm': 10, 'reg': 2, 'wh': 2, 'tracking': 2, 'dep': 1, 'rot': 8, 'dim': 3, 'amodel_offset': 2}
weights {'hm': 1, 'reg': 1, 'wh': 0.1, 'tracking': 1, 'dep': 1, 'rot': 1, 'dim': 1, 'amodel_offset': 1}
head conv {'hm': [256], 'reg': [256], 'wh': [256], 'tracking': [256], 'dep': [256], 'rot': [256], 'dim': [256], 'amodel_offset': [256]}
Creating model...
Using node type: (<class 'model.networks.dla.DeformConv'>, <class 'model.networks.dla.DeformConv'>)
Warning: No ImageNet pretrain!!
loaded ../models/nuScenes_3Dtracking.pth, epoch 70
out_name nuscenes_mini.mp4
qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/usr/local/lib/python3.6/dist-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb.

@Keiku
Copy link
Author

Keiku commented Feb 1, 2021

@ahyunlee Since the Docker environment does not have a display, please modify demo.py so that it does not use the display.

@ahyunlee
Copy link

ahyunlee commented Feb 5, 2021

@ahyunlee Since the Docker environment does not have a display, please modify demo.py so that it does not use the display.

Thanks! It worked! I commented 'cv2.imshow' in all files.

@fabio-cancio-sena
Copy link

Everyone seems to be having trouble with issue about GPU compilation of DCNv2, so I added a Dockerfile that works correctly.

I have confirmed the operation in the following environment.

> ~ cat /etc/os-release | grep PRETTY_NAME
PRETTY_NAME="Ubuntu 18.04.2 LTS"> ~ docker --version
Docker version 19.03.5, build 633a0ea838
⋊> ~ docker-compose -v
docker-compose version 1.25.3, build unknown
⋊> ~ docker info | grep -i runtime
WARNING: No swap limit support
 Runtimes: nvidia runc
 Default Runtime: nvidia
⋊> ~ 

If you want to use cuda when building docker container, you need to set the following daemon.json.

⋊> ~ cat /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

After preparing the above environment, execute Docker build with the following command.

docker-compose up -d dev

You cannot build with cuda unless you add the following to docker-compose.yaml.

    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all

For Docker 19.03 and later versions, use the --gpus all option. Execute the docker run command as follows.

docker run --gpus all --ipc=host --rm -it \
         -v /home/keiichi.kuroyanagi/datasets/:/CenterTrack/data/ \
         -v /home/keiichi.kuroyanagi/pretrained_models/:/CenterTrack/models/ \
         centertrack_dev

Hey @Keiku , thank you for kindly sharing your work on containerizing CenterTrack. I'm facing the same trouble with the issue about GPU compilation of DCNv2, so I've tried use your Dockerfile and docker-compose without success. Can you share an updated version of this files?

@Keiku
Copy link
Author

Keiku commented Jul 12, 2021

@fabio-cancio-sena Please tell me the version of your Docker. By the way, in my understanding, nvidia-docker2 is unnecessary. Instead NVIDIA Container Toolkit is required. You can find it with a command like the following docker --version, nvidia-container-cli -V.

⋊> ~ docker --version                                                         06:45:54
Docker version 19.03.5, build 633a0ea838
⋊> ~ nvidia-container-cli -V                                                  06:46:21
version: 1.3.0
build date: 2020-09-16T12:32+00:00
build revision: 16315ebdf4b9728e899f615e208b50c41d7a5d15
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
⋊> ~ 

@fabio-cancio-sena
Copy link

docker --version

I see. Do you have an updated version of your Dockerfile? I'm having trouble building DCN locally and with torch (GPU not available) with your old dockerfile.

Here are the softwares versions:
docker --version

Docker version 20.10.2, build 20.10.2-0ubuntu1~18.04.2

nvidia-container-cli -V

version: 1.4.0
build date: 2021-04-24T14:25+00:00
build revision: 704a698b7a0ceec07a48e56c37365c741718c2df
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

@Keiku
Copy link
Author

Keiku commented Jul 13, 2021

@fabio-cancio-sena I also confirmed the error in the following environment. It was a long time ago, so I can't resolve the error right now. I will try to resolve the error as soon as I have free time. Please try it yourself for the time being.

⋊> ~ docker --version                                                         14:07:31
Docker version 20.10.7, build f0df350
⋊> ~ nvidia-container-cli -V                                                  14:14:25
version: 1.4.0
build date: 2021-04-24T14:26+00:00
build revision: 704a698b7a0ceec07a48e56c37365c741718c2df
build compiler: gcc-5 5.4.0 20160609
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

RUN python -c 'import torch; assert torch.cuda.is_available(), "Cuda is not available."'
WORKDIR /CenterTrack/src/lib/model/networks
RUN git clone --recursive https://github.com/CharlesShang/DCNv2
RUN cd DCNv2 && bash ./make.sh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a symlink otherwise the make.sh will fail. Also use WORKDIR to change directory not cd

RUN ln -s /usr/bin/python3.6 /usr/bin/python
COPY . /centertrack

WORKDIR /centertrack/src/lib/model/networks/
RUN git clone --recursive https://github.com/CharlesShang/DCNv2
WORKDIR DCNv2
RUN /bin/bash make.sh

@yktangac
Copy link

Does Anyone encounter this error?

/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py:352: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
error: command 'g++' failed with exit status 1

I am not sure whether it is caused by pytorch version.

pip3 install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

I checked the docker info. It is the same as yours.

Any thoughts on this ? I appreciate you in advance.

@elkoz
Copy link

elkoz commented Aug 18, 2022

Note that you need to run systemctl reload docker after setting the default runtime

RUN apt-get install -y --no-install-recommends software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt update
RUN apt install -y --no-install-recommends python3.6
Copy link

@elkoz elkoz Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUN apt update
RUN apt install -y --no-install-recommends python3.6

doesn't seem to work anymore
RUN apt-get install -y python3.6 + using nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 as base image fixed that for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants