Allow generating different CI jobs for tests and/or separate outputs #1472

h-vetinari · 2021-04-09T19:16:37Z

This is a topic I've been thinking about for a while regarding run-times (cf. conda-forge/conda-forge.github.io#902), but it also came up recently in the context of a GPU-specific build queue, see for example the notes of a conda-forge meeting about a month ago (cf. also conda-forge/conda-forge.github.io#1272):

may need to figure out how to build on CPU and test on the GPU since locking the GPU during build is expensive and unneeded.

The core idea is to enable an opt-in per feedstock to run certain test sections (or outputs) in a separate CI job.

This obviously would help if the build-times are close to timing out (at the cost of persisting the artefact somewhere, but perhaps azure even supports that natively within the same job group...)
It becomes even more important for such a possible GPU-specific queue. Since such a queue would have limited resources, it would be a waste to build GPU-enabled packages with that queue, when the build process generally does not need GPUs itself (and could thus be restricted to just testing).

In fact, I think the perhaps best as well as easiest solution might be to allow opting into separate jobs per output. Not only can tests be moved to a separate output (in a multi-output recipe; and I've seen this in several places already, cf. e.g. pyarrow), but that would also provide a way to bring long-running recipes under the 6h limit, as long as the build steps can be modularized even a little bit - not least in recipes that are already multi-output (e.g. it would help me a lot on the faiss-feedstock, where the GPU lib takes between 3-5h to build, often leaving not enough time for the python bindings).

The one thing that would be necessary of course is to have a simple graph of which outputs depend on what other ones, so that they can get built in the correct order. From looking at the CI in the staged-recipes repo (where e.g. GPU builds aren't started by default), it seems that such functionality should be possible in principle...

CC @beckermr @isuruf @wolfv @jakirkham

isuruf · 2021-04-30T20:03:38Z

e.g. it would help me a lot on the faiss-feedstock, where the GPU lib takes between 3-5h to build, often leaving not enough time for the python bindings

I don't see how this proposal would help in this case.

h-vetinari · 2021-04-30T20:17:16Z

Thanks for chiming in @isuruf!

@h-vetinari: e.g. it would help me a lot on the faiss-feedstock, where the GPU lib takes between 3-5h to build, often leaving not enough time for the python bindings

@isuruf: I don't see how this proposal would help in this case.

Well, roughly the infra takes 0.5h, then the lib takes 3-5 (for linux; 4-6 for win) and the python bindings + tests take 1h. Splitting off the latter means taking 1h off of a job that's flirting with the 6h timeout, and would definitely increase the success rate (if not to 100%, then from ~30% for the win cuda builds to ~80%).

But I want to stress that this is not the key driver of this issue - I just thought that splitting jobs by output might actually be easier than separating build & test phases (and noted this as an ancillary benefit).

beckermr · 2021-04-30T20:20:57Z

The original reason for this issue is to not tie up GPUs on the CI while building code. If we can arrange other ways to prevent that, then we don't need to do this.

isuruf · 2021-04-30T20:23:50Z

I just thought that splitting jobs by output might actually be easier than separating build & test phases

No, it's not. While conda-build has functionaility to separate build & test phases, there's nothing there can split jobs by output and would be impossible to implement.

h-vetinari · 2021-04-30T20:27:58Z

The original reason for this issue is to not tie up GPUs on the CI while building code. If we can arrange other ways to prevent that, then we don't need to do this.

Absolutely, but I don't see how we can get around the part of having to choose a different agent for running the test suite?

While conda-build has functionaility to separate build & test phases, there's nothing there can split jobs by output and would be impossible to implement.

I know that conda-build supports that split, but I was also thinking about the part where this needs to be configurable & generatable by smithy. I assume I'm not seeing the full picture re:implementability, but conda-build already generates a DAG of outputs and processes them in the correct order. The only point that I imagined would need to be added is to optionally ingest an artefact rather than running a particular output build script.

beckermr · 2021-04-30T20:30:49Z

@jakirkham This all depends on the software that runs the CI server. That software doesn't have to tie a given container to a specific GPU for the whole job.

isuruf · 2021-04-30T20:31:26Z

re:implementability, but conda-build already generates a DAG of outputs and processes them in the correct order.

conda-build runs the top level build.sh first and then copies the working directory for each output. Therefore building one output in one machine and building another in another machine is impossible.

h-vetinari · 2021-04-30T20:34:47Z

conda-build runs the top level build.sh first and then copies the working directory for each output. Therefore building one output in one machine and building another in another machine is impossible.

Some recipes do not have a top-level build.sh, and get by with one script per output based on the same sources (and possibly relying on previously compiled subpackages). Would that be a reasonable restriction?

h-vetinari · 2021-04-30T20:37:42Z

@beckermr: This all depends on the software that runs the CI server. That software doesn't have to tie a given container to a specific GPU for the whole job.

[Assuming this was addressed to me 😅]
That's a fair point, I guess I was thinking mostly about Azure (where I'm not aware this would be possible), because it'd still be nice to reuse those for long-compiling jobs, and then switch to the GPU queue for testing.

beckermr · 2021-04-30T20:42:30Z

🤦

beckermr · 2021-04-30T20:44:21Z

On azure we can upload artifacts and pass them between jobs iiuic

h-vetinari · 2021-04-30T20:46:57Z

On azure we can upload artifacts and pass them between jobs iiuc

Definitely, and if we go with the build/test split, then it would just mean how that split could be configured, generated & executed by smithy (which is what I originally intended this issue to be about - sorry for complicating the discussion with the other option about splitting on outputs).

beckermr · 2021-04-30T20:49:48Z

💯

isuruf · 2021-04-30T20:51:41Z

On azure we can upload artifacts and pass them between jobs iiuc

This avoid restrict us to use the same provider for the build/test. I'd rather we use the artifacts from cf-staging to test. One CI uploads the artifacts, and the other downloads it and runs it.

beckermr · 2021-04-30T20:55:15Z

🤔

beckermr · 2021-04-30T21:00:47Z

Our current infrastructure cannot securely support this. A well-timed attacker could inject another feedstock's artifact at just the right moment since uploads to cf-staging are not gated in any way.

We could upload to a test label, pull from there, and then make a secure call over https to our admin server to relabel the object if the tests pass.

beckermr · 2021-04-30T21:02:13Z

I also do not know how we will be able to orchestrate dependent jobs across CI services.

beckermr · 2021-04-30T21:14:56Z

Actually no, we can't use a test label since we cannot upload to our channels from PRs on forks.

h-vetinari · 2021-04-30T21:20:55Z

Staying within Azure sounds like a reasonable first step then? It's not great to be locked in like that, but at least one GPU queue proposal was azure-based anyway, so that might provide a faster way forward (while figuring out secure artefact sharing within PRs can still come later).

isuruf · 2021-04-30T21:44:50Z

Do we want to test PRs on GPU queue though?

h-vetinari · 2021-04-30T21:55:51Z

Do we want to test PRs on GPU queue though?

That was my assumption. I guess it would be another thing that could ultimately be considered not as crucial as having any GPU tests run on master-only. It would be more painful/manual to not have GPU-CI on PRs, but I guess that would in a way be even more resource-conservative 😄

h-vetinari · 2021-05-01T12:22:28Z

So we have two options (third one just for completeness):

Artefact sharing	Cache within azure	Push to test label	Push to cf-staging
Can be made secure	✔️	✔️	❌
Share across CI providers	❌	✔️	✔️
Can run in PRs	✔️	❌	❌
Implementation effort	❔	❔	❔

While it would be nice to be able to debug GPU builds in PRs, this could still be done manually with artefact persistence... Sharing between CI providers might be more desirable in comparison, even if it's master-only.

beckermr · 2021-05-01T12:33:31Z

Let's start with caching within azure.

h-vetinari mentioned this issue Apr 10, 2021

Try qt6 conda-forge/qt-feedstock#175

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow generating different CI jobs for tests and/or separate outputs #1472

Allow generating different CI jobs for tests and/or separate outputs #1472

h-vetinari commented Apr 9, 2021

isuruf commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

beckermr commented Apr 30, 2021

isuruf commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

beckermr commented Apr 30, 2021

isuruf commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

beckermr commented Apr 30, 2021

beckermr commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

beckermr commented Apr 30, 2021

isuruf commented Apr 30, 2021

beckermr commented Apr 30, 2021

beckermr commented Apr 30, 2021 •

edited

Loading

beckermr commented Apr 30, 2021

beckermr commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

isuruf commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

h-vetinari commented May 1, 2021

beckermr commented May 1, 2021

Allow generating different CI jobs for tests and/or separate outputs #1472

Allow generating different CI jobs for tests and/or separate outputs #1472

Comments

h-vetinari commented Apr 9, 2021

isuruf commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

beckermr commented Apr 30, 2021

isuruf commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

beckermr commented Apr 30, 2021

isuruf commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

beckermr commented Apr 30, 2021

beckermr commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

beckermr commented Apr 30, 2021

isuruf commented Apr 30, 2021

beckermr commented Apr 30, 2021

beckermr commented Apr 30, 2021 • edited Loading

beckermr commented Apr 30, 2021

beckermr commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

isuruf commented Apr 30, 2021

h-vetinari commented Apr 30, 2021

h-vetinari commented May 1, 2021

beckermr commented May 1, 2021

beckermr commented Apr 30, 2021 •

edited

Loading