Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding tests run in coverage job #1676

Closed
Trott opened this issue Jan 31, 2019 · 32 comments
Closed

Expanding tests run in coverage job #1676

Trott opened this issue Jan 31, 2019 · 32 comments
Labels
ci-change PSA of configuration changes

Comments

@Trott
Copy link
Member

Trott commented Jan 31, 2019

I changed the configuration for https://ci.nodejs.org/view/All/job/node-test-commit-linux-coverage-daily/ to also run pummel and benchmark jobs.

Under "Execute Shell", I changed the last line from this:

NODE_TEST_DIR=${HOME}/node-tmp PYTHON=python COVTESTS="test-ci test-internet" make coverage -j $(getconf _NPROCESSORS_ONLN)

...to this:

NODE_TEST_DIR=${HOME}/node-tmp PYTHON=python COVTESTS="test-ci test-internet test-pummel test-benchmark" make coverage -j $(getconf _NPROCESSORS_ONLN)

Once nodejs/node#25799 lands, we can replace the list of suites with test-all-suites and all tests under the test directory will run no matter where they are.

@Trott Trott changed the title Add pummel and benchmark tests to coverage job Expanding tests run in coverage job Jan 31, 2019
@Trott Trott added the ci-change PSA of configuration changes label Jan 31, 2019
@Trott
Copy link
Member Author

Trott commented Jan 31, 2019

Guess I'll leave this open until we can change it to test-all-suites.

@Trott
Copy link
Member Author

Trott commented Jan 31, 2019

Hmmm...my change somehow added code in the benchmark directory to the measurements, which is definitely not what I wanted to do. Any idea how to exclude it or why it got included? @bcoe

@Trott
Copy link
Member Author

Trott commented Jan 31, 2019

Oh, I see, I need to add it as an --exclude in the call to c8 in the Makefile.

@Trott
Copy link
Member Author

Trott commented Jan 31, 2019

(In the meantime, I'll remove the benchmark tests from the coverage job.)

@mhdawson
Copy link
Member

@Trott, how much run time does adding those 2 make?

Working on #1669 and just want to understand how that will affect the run time there as well. We will also have to sync that job.

@Trott
Copy link
Member Author

Trott commented Jan 31, 2019

@Trott, how much run time does adding those 2 make?

Working on #1669 and just want to understand how that will affect the run time there as well. We will also have to sync that job.

The whole job, start-to-finish, takes around 20 minutes give-or-take 5 minutes. My run took 19 minutes and the previous run (without these additions) was 21 minutes, so whatever additional time there is, it is negligible. On the other hand, that is odd because I believe two pummel tests timed out, so I would expect that to add 4 minutes right there. (Maybe there's some Jenkins magic there? Does this job run things in parallel even though they're not parallel-safe somehow?)

@mhdawson
Copy link
Member

@Trott key thanks, if its not significantly longer that's good.

I doubt there is "magic".

@mhdawson
Copy link
Member

mhdawson commented Feb 1, 2019

Something strange is going on. I'm seeing failures in the sanity coverage job when trying to add pummel. Looks like the different types are running in parallel which seems wrong so I'm not sure adding these until we figure that out is a good idea as it seems to be causing tests to fail that previously passed because they are running in parallel.

The coverage job does not intentionally try to run anything in parallel that is not normally run that way.

@refack
Copy link
Contributor

refack commented Feb 1, 2019

Looks like the different types are running in parallel which seems wrong

The change in nodejs/node#25799 will fix that. Parallelism will only be done by test.py not by make.
(We should make sure all make test* run with -j anyway)

@mhdawson
Copy link
Member

mhdawson commented Feb 1, 2019

I don't see how 25799 is going to fix that for the coverage jobs. @refack is it a known problem that the suites are running in parallel?

@Trott
Copy link
Member Author

Trott commented Feb 1, 2019

@mhdawson I'd guess it will fix it because the coverage job will run only one make target (test-all-suites) whereas currently, make runs multiple targets (test-pummel and test-internet and so on) and ends up running multiple test.py instances.

@Trott
Copy link
Member Author

Trott commented Feb 1, 2019

By the way, the only thing keeping nodejs/node#25799 from landing at this point is that it needs one more approval. @refack? @mhdawson?

@mhdawson
Copy link
Member

mhdawson commented Feb 1, 2019

Hmm, making we think a bit more about the coverage sanity job. We want it to run as part of the regular PR validation so that tests don't get introduced that will cause the main coverage job to fail. But if we've chosen not to run tests already it should probably not run them either, unless running a single instance (ubuntu) is ok for every PR run.

@mhdawson
Copy link
Member

mhdawson commented Feb 1, 2019

The goal is to avoid things going in that then later require cleanup by people watching the main coverage job.

@refack
Copy link
Contributor

refack commented Feb 1, 2019

is it a known problem that the suites are running in parallel?

Yes. It's nodejs/node#22006 and nodejs/node#24966. Patched for regular testing with nodejs/node#23733

@mhdawson
Copy link
Member

mhdawson commented Feb 1, 2019

This patch is currently being applied in respect to 24966 but I guess it does not completely address the issue? #1671

@refack
Copy link
Contributor

refack commented Feb 1, 2019

unless running a single instance (ubuntu) is ok for every PR run.

If we get to a point where the coverage jobs works with all tests passing, we could fail it on a test fail (i.e. nodejs/node#25432) and that way we get daily sanity (even trigger it as part of the general daily master sanity)
IMO that should give us enough protection. Especially now that we don't instrument the JS code and the coverage run is much much much more robust.

This patch is currently being applied in respect to 24966 but I guess it does not completely address the issue? #1671

It solves the one one aspect of a race while building. It does not solve the fragility of parallel test.py runs.

@Trott
Copy link
Member Author

Trott commented Feb 1, 2019

OK, switched the node-test-commit-linux-coverage-daily job up to use test-all-suites and running it with the publish-to-coverage.nodejs.org box unchecked: https://ci.nodejs.org/view/All/job/node-test-commit-linux-coverage-daily/24/

Trott added a commit to Trott/io.js that referenced this issue Feb 1, 2019
Trott added a commit to Trott/io.js that referenced this issue Feb 2, 2019
Refs: nodejs/build#1676 (comment)

PR-URL: nodejs#25841
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Ben Coe <bencoe@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@Trott
Copy link
Member Author

Trott commented Feb 2, 2019

Shoot, it's still running suites in parallel, resulting in failures/missed coverage.

Making this change now:

Before:

NODE_TEST_DIR=${HOME}/node-tmp PYTHON=python COVTESTS="test-all-suites" make coverage -j $(getconf _NPROCESSORS_ONLN)

After:

NODE_TEST_DIR=${HOME}/node-tmp PYTHON=python COVTESTS="test-all-suites" make coverage -j 1

Running to see if it gets correct stats that way and also to see how long it takes to run:
https://ci.nodejs.org/view/All/job/node-test-commit-linux-coverage-daily/26/

@Trott
Copy link
Member Author

Trott commented Feb 2, 2019

Well, that helped a little but not much. Putting the job back to this until we can figure out why it doesn't work with test-all-suites:

NODE_TEST_DIR=${HOME}/node-tmp PYTHON=python COVTESTS="test-ci test-internet test-pummel test-benchmark" make coverage -j $(getconf _NPROCESSORS_ONLN)

And kicking off a job to restore coverage stats: https://ci.nodejs.org/view/All/job/node-test-commit-linux-coverage-daily/27/

@Trott
Copy link
Member Author

Trott commented Feb 2, 2019

OK, so even though I was running test-all-suites, this showed up, I guess due to one of the prerequisites, and a test failure there ended the whole thing before it got to test-all-suites:

make[2]: Leaving directory `/home/iojs/build/workspace/node-test-commit-linux-coverage-daily/nodes/benchmark/test/node-api/test_uv_loop/build'
/usr/bin/python2.7 tools/test.py -j 8 --mode=release js-native-api

@Trott
Copy link
Member Author

Trott commented Feb 2, 2019

The problem would seem to be that test-all-suites has test-js-native-api as a pre-requisite. We really just need test-build I think. Testing that locally with a make clean && make test-all-suites. If it works out, I'll open a PR, run a modified coverage job against it as part of the test, and hopefully that will finally get this done.

@Trott
Copy link
Member Author

Trott commented Feb 2, 2019

nodejs/node#25892

@Trott
Copy link
Member Author

Trott commented Feb 2, 2019

Changed it back to test-all-suites and running a job against nodejs/node#25892 rather than master and with the publish checkbox unchecked: https://ci.nodejs.org/view/All/job/node-test-commit-linux-coverage-daily/28/

Will now change things back to test-ci test-internet test-pummel test-benchmark.

@Trott
Copy link
Member Author

Trott commented Feb 2, 2019

Coverage stats with that PR look reasonable:

06:30:40 Javascript coverage %: 95.47%
06:30:40 C++ coverage %: 90.3%

@Trott
Copy link
Member Author

Trott commented Feb 2, 2019

(And it took 20 minutes to run, so that seems reasonable too.)

@Trott
Copy link
Member Author

Trott commented Feb 3, 2019

nodejs/node#25892 landed. Updating job again:

NODE_TEST_DIR=${HOME}/node-tmp PYTHON=python COVTESTS="test-all-suites" make coverage -j $(getconf _NPROCESSORS_ONLN)

Running coverage job: https://ci.nodejs.org/view/All/job/node-test-commit-linux-coverage-daily/29/

@Trott
Copy link
Member Author

Trott commented Feb 3, 2019

Hmmm...git timing out in that run it seems...

Trying again: https://ci.nodejs.org/view/All/job/node-test-commit-linux-coverage-daily/30/

@Trott
Copy link
Member Author

Trott commented Feb 3, 2019

@refack reported that a Jenkins workspace needed attention but that things should be fine now. So, trying again: https://ci.nodejs.org/view/All/job/node-test-commit-linux-coverage-daily/31/

@Trott
Copy link
Member Author

Trott commented Feb 3, 2019

That run looks good. Optimistically closing this.

@Trott Trott closed this as completed Feb 3, 2019
addaleax pushed a commit to nodejs/node that referenced this issue Feb 3, 2019
Refs: nodejs/build#1676 (comment)

PR-URL: #25841
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Ben Coe <bencoe@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@mhdawson
Copy link
Member

mhdawson commented Feb 4, 2019

The coverage test job where we are trying to only fail if something other than test failing seems to be having problems with the updates.

In particular, this seems to be timing out after 5 mins: test-dnsBuild timed out. See https://ci.nodejs.org/job/node-test-commit-coverage/nodes=ubuntu1604_sharedlibs_x64/59/console

@mhdawson mhdawson reopened this Feb 4, 2019
@mhdawson
Copy link
Member

mhdawson commented Feb 4, 2019

Not sure we don't see the same failure in the main coverage run. IN any case I guess we'll investigate that separately.

@mhdawson mhdawson closed this as completed Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-change PSA of configuration changes
Projects
None yet
Development

No branches or pull requests

3 participants