Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkinsfile Fixes Part 2 #627

Merged
merged 70 commits into from
Mar 2, 2022
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
ebf6052
Jenkins + Docker Improvements
ravi-mosaicml Feb 28, 2022
f2cb6a1
Fixed typo
ravi-mosaicml Feb 28, 2022
6870eb0
Echoing job urls
ravi-mosaicml Feb 28, 2022
987ae95
Fix typo
ravi-mosaicml Feb 28, 2022
3940a6f
Testing...
ravi-mosaicml Mar 1, 2022
3f8ec23
Testing
ravi-mosaicml Mar 1, 2022
3984964
Added storage limit; moved the build matrix locally
ravi-mosaicml Mar 1, 2022
cf27afc
Fixes
ravi-mosaicml Mar 1, 2022
8d2fb3d
testing
ravi-mosaicml Mar 1, 2022
816c08e
testing
ravi-mosaicml Mar 1, 2022
2385489
Fixing for merge commits
ravi-mosaicml Mar 1, 2022
6da83a6
testing
ravi-mosaicml Mar 1, 2022
48f4d7a
Merge branch 'dev' into ravi/jenkinsfile_improvements
ravi-mosaicml Mar 1, 2022
1373409
testing
ravi-mosaicml Mar 1, 2022
4f2808a
Merge branch 'ravi/jenkinsfile_improvements' of github.com:mosaicml/c…
ravi-mosaicml Mar 1, 2022
48bf896
testing
ravi-mosaicml Mar 1, 2022
1a417e5
testing
ravi-mosaicml Mar 1, 2022
057ae81
testing
ravi-mosaicml Mar 1, 2022
80d12a0
testing
ravi-mosaicml Mar 1, 2022
1975611
Jenkinsfile cleanup
ravi-mosaicml Mar 1, 2022
203df72
Removed runWithChecks; fixed echoing of URL on subjob failures
ravi-mosaicml Mar 1, 2022
e646a56
Reconfigured docker builds
ravi-mosaicml Mar 1, 2022
9fd831b
testing
ravi-mosaicml Mar 1, 2022
a083a4d
Fixed typo
ravi-mosaicml Mar 1, 2022
0cd68d6
Parallelize the dockerbuilds
ravi-mosaicml Mar 1, 2022
737f810
Testing
ravi-mosaicml Mar 1, 2022
8dae40e
Fixed pytorchDockerBuildMatrix
ravi-mosaicml Mar 1, 2022
1c6d3d3
Bugfixes
ravi-mosaicml Mar 1, 2022
075e228
Added missing def
ravi-mosaicml Mar 1, 2022
0f6d896
testing
ravi-mosaicml Mar 1, 2022
48a9e9b
testing
ravi-mosaicml Mar 1, 2022
6e4214e
Reduce verbosity
ravi-mosaicml Mar 1, 2022
b88ffc2
Bugfixes
ravi-mosaicml Mar 1, 2022
084cba7
Remove echo
ravi-mosaicml Mar 1, 2022
d8df259
Added milestone
ravi-mosaicml Mar 1, 2022
37f136e
Fixed milestone
ravi-mosaicml Mar 1, 2022
73d0634
testing
ravi-mosaicml Mar 1, 2022
692b2dd
testing
ravi-mosaicml Mar 1, 2022
e721bb1
Updated the description in setup.py to match the readme.
ravi-mosaicml Mar 1, 2022
9db390a
testing
ravi-mosaicml Mar 1, 2022
d5700f3
Fixed build conda
ravi-mosaicml Mar 1, 2022
95f2a57
Merge branch 'ravi/jenkinsfile_improvements' into ravi/jenkinsfile_im…
ravi-mosaicml Mar 1, 2022
b5dc06b
Merge branch 'dev' into ravi/jenkinsfile_improvements
ravi-mosaicml Mar 1, 2022
500e8a3
Adjusted memory requirements
ravi-mosaicml Mar 1, 2022
3001287
Merge branch 'ravi/jenkinsfile_improvements' into ravi/jenkinsfile_im…
ravi-mosaicml Mar 1, 2022
b597021
testing
ravi-mosaicml Mar 1, 2022
80ad8e6
Merge branch 'ravi/jenkinsfile_improvements' into ravi/jenkinsfile_im…
ravi-mosaicml Mar 1, 2022
bba873b
Adjusted conda limits
ravi-mosaicml Mar 1, 2022
8b22a2a
Merge branch 'ravi/jenkinsfile_improvements' into ravi/jenkinsfile_im…
ravi-mosaicml Mar 1, 2022
7f1de0d
Merge branch 'dev' into ravi/jenkinsfile_improvments_test
ravi-mosaicml Mar 1, 2022
36c95ca
Increaed conda memory limit
ravi-mosaicml Mar 1, 2022
0046f34
Excluding the jenkinsfile repo changes from the changelog
ravi-mosaicml Mar 1, 2022
29bd924
Fix the dockerfile once more
ravi-mosaicml Mar 1, 2022
e88fffe
Increase conda timeout
ravi-mosaicml Mar 1, 2022
131fb7c
Tagged the latest image
ravi-mosaicml Mar 1, 2022
b780e49
testing
ravi-mosaicml Mar 1, 2022
d8f923b
Increased docker build ephemeral storage limit
ravi-mosaicml Mar 1, 2022
9595664
Fixed a typo
ravi-mosaicml Mar 1, 2022
10f5316
Merge branch 'dev' into ravi/jenkinsfile_improvments_test
ravi-mosaicml Mar 1, 2022
4b942d4
Update .ci/Jenkinsfile
ravi-mosaicml Mar 1, 2022
771e4cd
Fixed a race condition where multiple pytests wrote to the same junitxml
ravi-mosaicml Mar 1, 2022
4b5e667
Merge branch 'dev' into ravi/jenkinsfile_improvments_test
ravi-mosaicml Mar 2, 2022
85dd68f
Skip all deepspeed tests
ravi-mosaicml Mar 2, 2022
d23bcaf
Merge branch 'dev' into ravi/jenkinsfile_improvments_test
ravi-mosaicml Mar 2, 2022
48e2d3d
Merge branch 'ravi/jenkinsfile_improvments_test' of github.com:mosaic…
ravi-mosaicml Mar 2, 2022
73aa462
Merge branch 'dev' into ravi/jenkinsfile_improvments_test
ravi-mosaicml Mar 2, 2022
8c315f4
Increased storage
ravi-mosaicml Mar 2, 2022
b4e8e9a
Merge branch 'ravi/jenkinsfile_improvments_test' of github.com:mosaic…
ravi-mosaicml Mar 2, 2022
4da5167
Increased ephemeral storage limit
ravi-mosaicml Mar 2, 2022
e0ad4d4
Increased storage to 32Gi
ravi-mosaicml Mar 2, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 25 additions & 9 deletions .ci/Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ def cloneJenkinsfilesRepo() {
doGenerateSubmoduleConfigurations: false,
extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: jenkinsfileRepoTargetDir]],
submoduleCfg: [],
changelog: false,
userRemoteConfigs: [[url: jenkinsfileRepo, credentialsId: gitCredentialsId]]
])
return "$WORKSPACE_TMP/$jenkinsfileRepoTargetDir"
Expand Down Expand Up @@ -179,16 +180,16 @@ stage('Build') {
pytorchDockerBuildMatrix.each { entry ->
def command = entry[0] // command is the command to run
def stagingImage = entry[1] // stagingImage is where the built docker image is pushed
def buildArgs = entry[2] // buildArgs is a map of the build arguments passed to kaniko
jobs << [ "$buildArgs": { ->
def buildConfigListOfTuples = entry[2] // buildConfigListOfTuples is a list of (key, value) pairs of the build args passed to kaniko
ravi-mosaicml marked this conversation as resolved.
Show resolved Hide resolved
jobs << [ "$buildConfigListOfTuples": { ->
trackBuild(
job: jenkinsShellJobName,
parameters: [
string(name: 'P_CLOUD', value: pCloud),
string(name: 'P_GIT_REPO', value: gitUrl),
string(name: 'P_GIT_COMMIT', value: gitCommit),
string(name: 'P_DOCKER_IMAGE', value: kanikoDockerImage),
string(name: 'P_EPHEMERAL_STORAGE_LIMIT', value: '16Gi'),
string(name: 'P_EPHEMERAL_STORAGE_LIMIT', value: '32Gi'),
text(name: 'P_COMMAND', value: command),
string(name: 'P_TIMEOUT', value: pTimeout),
string(name: 'P_CPU_LIMIT', value: '4'),
Expand All @@ -199,16 +200,31 @@ stage('Build') {
// no need to run tests again
return
}
def tag = buildArgs['TAG']
def gpu = buildArgs['CUDA_VERSION'] != 'cpu'
def gpu = false
def isLintImage = false
def tag = null
buildConfigListOfTuples.each { item ->
def key = item[0]
def val = item[1]

if (key == 'CUDA_VERSION') {
gpu = val != 'cpu'
}
if (key == 'TAG') {
tag = val
// there could be multiple tags
isLintImage = isLintImage || tag == lintImage
}

}
def extraDeps = 'all'
def subJobs = [
"Pytest - ${tag}" : { -> runPytest(stagingImage, gpu, extraDeps) }
]
if (tag == lintImage) {
if (isLintImage) {
// and run lint and a dev install on this image
subJobs << [
"Pytest - ${tag}, extraDeps=dev": { -> runPytest(stagingImage, false, 'dev') },
"Pytest - extraDeps=dev": { -> runPytest(stagingImage, false, 'dev') },
"Lint": { -> runLint(stagingImage) },
]
}
Expand Down Expand Up @@ -244,9 +260,9 @@ stage('Build') {
string(name: 'P_CLOUD', value: pCloud),
string(name: 'P_GIT_REPO', value: gitUrl),
string(name: 'P_GIT_COMMIT', value: gitCommit),
string(name: 'P_EPHEMERAL_STORAGE_LIMIT', value: '8Gi'),
string(name: 'P_EPHEMERAL_STORAGE_LIMIT', value: '32Gi'),
string(name: 'P_DOCKER_IMAGE', value: condaBuildDockerImage),
string(name: 'P_TIMEOUT', value: pTimeout),
string(name: 'P_TIMEOUT', value: '3600'), // Conda builds take longer
string(name: 'P_CPU_LIMIT', value: "4"),
string(name: 'P_MEM_LIMIT', value: "8Gi"),
string(name: 'P_COMMAND', value: "./.ci/build_conda.sh")
Expand Down
2 changes: 1 addition & 1 deletion docker/pytorch/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG DEBIAN_FRONTEND=noninteractive
# remove a bad symlink from the base composer image
# If this file is present after the first command, kaniko
# won't be able to build the docker image.
RUN rm -f /usr/local/cuda-11.3/cuda-11.3 && mkdir -p /usr/local/cuda-11.3 && touch /usr/local/cuda-11.3/cuda-11.3
RUN rm -f /usr/local/cuda-11.3/cuda-11.3

RUN apt-get update && \
apt-get install -y --no-install-recommends \
Expand Down
2 changes: 1 addition & 1 deletion docker/pytorch/build_matrix.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ echo "TAG='mosaicml/pytorch:1.9.1_cu111-python3.7-ubuntu20.04' BASE_IMAGE='nvidi
echo "TAG='mosaicml/pytorch:1.9.1_cpu-python3.7-ubuntu20.04' BASE_IMAGE='ubuntu:20.04' PYTHON_VERSION='3.7' CUDA_VERSION_TAG='cpu' CUDA_VERSION='cpu' LINUX_DISTRO='ubuntu:20.04' PYTORCH_VERSION='1.9.1' TORCHVISION_VERSION='0.10.1'"
echo "TAG='mosaicml/pytorch:1.9.1_cu111-python3.8-ubuntu20.04' BASE_IMAGE='nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04' PYTHON_VERSION='3.8' CUDA_VERSION_TAG='cu111' CUDA_VERSION='11.1.1' LINUX_DISTRO='ubuntu:20.04' PYTORCH_VERSION='1.9.1' TORCHVISION_VERSION='0.10.1'"
echo "TAG='mosaicml/pytorch:1.9.1_cpu-python3.8-ubuntu20.04' BASE_IMAGE='ubuntu:20.04' PYTHON_VERSION='3.8' CUDA_VERSION_TAG='cpu' CUDA_VERSION='cpu' LINUX_DISTRO='ubuntu:20.04' PYTORCH_VERSION='1.9.1' TORCHVISION_VERSION='0.10.1'"
echo "TAG='mosaicml/pytorch:1.10.0_cu113-python3.9-ubuntu20.04' BASE_IMAGE='nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04' PYTHON_VERSION='3.9' CUDA_VERSION_TAG='cu113' CUDA_VERSION='11.3.1' LINUX_DISTRO='ubuntu:20.04' PYTORCH_VERSION='1.10.0' TORCHVISION_VERSION='0.11.1'"
echo "TAG='mosaicml/pytorch:1.10.0_cu113-python3.9-ubuntu20.04' TAG='mosaicml/pytorch:latest' BASE_IMAGE='nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04' PYTHON_VERSION='3.9' CUDA_VERSION_TAG='cu113' CUDA_VERSION='11.3.1' LINUX_DISTRO='ubuntu:20.04' PYTORCH_VERSION='1.10.0' TORCHVISION_VERSION='0.11.1'"
echo "TAG='mosaicml/pytorch:1.10.0_cpu-python3.9-ubuntu20.04' BASE_IMAGE='ubuntu:20.04' PYTHON_VERSION='3.9' CUDA_VERSION_TAG='cpu' CUDA_VERSION='cpu' LINUX_DISTRO='ubuntu:20.04' PYTORCH_VERSION='1.10.0' TORCHVISION_VERSION='0.11.1'"
echo "TAG='mosaicml/pytorch:1.9.1_cu111-python3.8-ubuntu18.04' BASE_IMAGE='nvidia/cuda:11.1.1-cudnn8-devel-ubuntu18.04' PYTHON_VERSION='3.8' CUDA_VERSION_TAG='cu111' CUDA_VERSION='11.1.1' LINUX_DISTRO='ubuntu:18.04' PYTORCH_VERSION='1.9.1' TORCHVISION_VERSION='0.10.1'"
echo "TAG='mosaicml/pytorch:1.9.1_cpu-python3.8-ubuntu18.04' BASE_IMAGE='ubuntu:18.04' PYTHON_VERSION='3.8' CUDA_VERSION_TAG='cpu' CUDA_VERSION='cpu' LINUX_DISTRO='ubuntu:18.04' PYTORCH_VERSION='1.9.1' TORCHVISION_VERSION='0.10.1'"
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,9 @@ def package_files(directory: str):
version="0.4.0",
author="MosaicML",
author_email="team@mosaicml.com",
description="composing methods for ML training efficiency",
description=
"Composer provides well-engineered implementations of efficient training methods to give "
"the tools that help you train a better model for cheaper.",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/mosaicml/composer",
Expand Down