Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Dockerfile to include checked out version #338

Merged
merged 1 commit into from
Feb 11, 2020

Conversation

ryanleary
Copy link
Contributor

The recent change included in the PR to fix unit tests introduces a regression/bug in the Dockerfile. With the Dockerfile you should be able to build the currently checked out version (and any local changes).

The current Dockerfile doesn't support this -- basically mandates that a Docker image may only be made from a branch of the main repository.

Signed-off-by: Ryan Leary rleary@nvidia.com

Dockerfile Show resolved Hide resolved
@DEKHTIARJonathan
Copy link
Collaborator

DEKHTIARJonathan commented Feb 7, 2020

You do not want to do this for multiple reasons:

  1. You do not want to copy your local directory in the project. It makes the container build process non reproducible. Installing from PyPI or Github should be the only way acceptable for a release container.

  2. You do not want to install a package in development mode: pip -e ".[all]". It's a risky and non stable way to provide a package to a user.

  3. If you need nemo source inside the container, for dev for instance, you should mount the directory as a docker volume in the container and pip --upgrade -e ".[all]"

  4. It makes the docker container more "heavy" therefore takes more place on the hard-drive.


If you want to build a nightly or latest container, then you directly pull the latest version from github

@ryanleary
Copy link
Contributor Author

ryanleary commented Feb 7, 2020 via email

@blisc blisc self-requested a review February 10, 2020 23:13
blisc
blisc previously requested changes Feb 10, 2020
Copy link
Collaborator

@blisc blisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block pending discussion

okuchaiev
okuchaiev previously approved these changes Feb 11, 2020
@ryanleary
Copy link
Contributor Author

ryanleary commented Feb 11, 2020

Please have a look at the most recent update. I have attempted to address all feedback/concerns, while maintaining flexibility.

To create a release container that is reproducible, you can use this dockerfile and build the image in the following way:

RELEASE_TAG=v0.9.0
git clone git@github.com:NVIDIA/NeMo.git && cd NeMo
git checkout tags/$RELEASE_TAG
docker build --build-arg NEMO_VERSION=$(git describe --tags) -t nvcr.io/nvidia/nemo:$RELEASE_TAG

The same process works for a public release, as does a nightly/master build, as does simply building an image that is based on a local branch that we want to push and test on another system. The NEMO_VERSION build arg is mandatory and the docker image build fails if it is not specified.

(nemo) ➜  NeMo git:(fix-dockerfile) docker build -t this-will-fail-because-no-nemo-version .
[+] Building 2.7s (16/23)
[...]
 => CACHED [nemo-deps 8/8] RUN pip install --disable-pip-version-check --no-cache-dir -r requirements.txt                                                                                                                                                                                                           0.0s
 => ERROR [nemo 1/8] RUN /usr/bin/test -n "$NEMO_VERSION" &&     /bin/echo "export NEMO_VERSION=${NEMO_VERSION}" >> /root/.bashrc &&     /bin/echo "export BASE_IMAGE=nvcr.io/nvidia/pytorch:20.01-py3" >> /root/.bashrc                                                                                            0.8s
[...]

Within the container, at runtime, environment variables are set that specify the base image used as well as the commit/tag that was used.

root@1992d9e98eea:/workspace/nemo# env | grep -E 'NEMO_VERSION|BASE_IMAGE'
NEMO_VERSION=v0.9.0-601-ga977f38f
BASE_IMAGE=nvcr.io/nvidia/pytorch:20.01-py3

Between the mandatory build argument and inclusion as a runtime environment variable, I believe this provides adequate traceability as to what code was used to create the image. I would argue, in fact, that it's more traceable than the proposed alternative in this thread, because you could, in theory, use an arbitrary version of the Dockerfile to generate an image for a different tag.

While a copy of the source into Docker is done, it is done in a scratch layer and uses multi-stage builds + experimental Docker run mount to ensure the source is not included in the final image. Other modifications have been made within the image to ensure the lightest possible end container. The proposed Dockerfile generates an image that is ~200MB larger than the base NGC PyTorch container. The current Dockerfile generates an image that is over 2GB larger than the base NGC container.

For development purposes, you can create an image that contains only the NeMo dependencies and then map the local source tree into the container at runtime by selecting the appropriate target:

# build a dev image that doesnt have to change unless the dependencies do
docker build --build-arg NEMO_VERSION=dev --target nemo-deps -t nemo:20.01-py3-devel .
# checkout any branch you like
git checkout wip-add-new-feature
docker run -it --rm --gpus all -v $(pwd):/workspace/nemo nemo:20.01-py3-devel
pip install -e .[all]

Any changes made in the source will be reflected both inside and outside the container, and NeMo can be reinstalled as needed.

@blisc blisc dismissed their stale review February 11, 2020 21:33

Discussion has happened

Signed-off-by: Ryan Leary <rleary@nvidia.com>
@okuchaiev okuchaiev merged commit 53c5da4 into NVIDIA:master Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants