Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola: Add --no-net flag #1478

Merged
merged 1 commit into from
May 26, 2020
Merged

Conversation

cgwalters
Copy link
Member

The Docker Hub recently decided that us pulling the nginx image
over and over many times a day was abusive and started rate limiting
us.

For OSTree's test suite at least, there's no good reason for us
to run podman's tests at all. Similarly for e.g. Ignition.

And just as a general rule I think it's useful to cleanly separate
tests that can be run fully offline from those that require
Internet access.

@sohankunkerkar
Copy link
Member

And the CI failed due to:

--- FAIL: podman.workflow (274.41s)

    --- FAIL: podman.workflow/run (251.35s)

            cluster.go:141: Trying to pull docker.io/library/nginx...

            cluster.go:141:   too many requests to registry

            cluster.go:141: Error: unable to pull docker.io/library/nginx: unable to pull image: Error parsing image configuration: too many requests to registry

            cluster.go:162: "sudo podman run -d -p 80:80 -v /tmp/tmp.kedL9Z7bVf/index.html:/usr/share/nginx/html/index.html:z docker.io/library/nginx" failed: output , status Process exited with status 125

@sohankunkerkar
Copy link
Member

For OSTree's test suite at least, there's no good reason for us
to run podman's tests at all. Similarly for e.g. Ignition.

+1
I've seen this error a couple of times in Ignition CI but wasn't sure about the fix. Thanks for putting up this PR.

@sohankunkerkar
Copy link
Member

/approve

@lucab

This comment has been minimized.

jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request May 25, 2020
Temporarily blacklist this test for now until we're off Docker Hub's
naughty list. This is blocking CI and pipelines.

See discussions in coreos/coreos-assembler#1478
for a longer-term approach on tests that require Internet access in
general.
@jlebon
Copy link
Member

jlebon commented May 25, 2020

Hmm actually... since this is (hopefully) temporary, maybe let's just blacklist the tests for now? That way we don't have to spread this new switch everywhere it's needed. OK, did that in: coreos/fedora-coreos-config#425.

But I agree with the motivation of this PR though.

jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request May 25, 2020
Temporarily blacklist this test for now until we're off Docker Hub's
naughty list. This is blocking CI and pipelines.

See discussions in coreos/coreos-assembler#1478
for a longer-term approach on tests that require Internet access in
general.
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request May 25, 2020
Temporarily blacklist this test for now until we're off Docker Hub's
naughty list. This is blocking CI and pipelines.

See discussions in coreos/coreos-assembler#1478
for a longer-term approach on tests that require Internet access in
general.
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request May 25, 2020
Temporarily blacklist these tests for now until we're off Docker Hub's
naughty list. This is blocking CI and pipelines.

See discussions in coreos/coreos-assembler#1478
for a longer-term approach on tests that require Internet access in
general.
jlebon added a commit to coreos/fedora-coreos-config that referenced this pull request May 25, 2020
Temporarily blacklist these tests for now until we're off Docker Hub's
naughty list. This is blocking CI and pipelines.

See discussions in coreos/coreos-assembler#1478
for a longer-term approach on tests that require Internet access in
general.
@cgwalters
Copy link
Member Author

I'm happy with the CI workaround for now, but:

My suggestion is to tag those tests with "net.dockerio.unauth" and to change the test-runner to always run the default tests (i.e. those with empty tag set) plus any tags specified on the CLI.
That could also allow us in the future to have something like "--fallible-tags" to allow soft-failures/flakes.

So far kola itself has not attached any semantic meaning to tags. I'm not opposed to that, but it'd be a notable change.

We also already have the "requires Internet" metadata on these tests note.

I also laid out a rationale why I think we want to support offline tests and not just specifically this Docker Hub issue:

And just as a general rule I think it's useful to cleanly separate tests that can be run fully offline from those that require Internet access.

To elaborate on that, all tests that require Internet are inherently flaky to some degree. Also I want to be able to e.g. run coreos-assembler while I'm on an airplane too.

(There has also been parallel discussions with enhancing the Kubernetes and OpenShift test suites to support being run disconnected, so one can validate a disconnected environment - this is particularly tricky for OpenShift's tests which talk to Github and Docker Hub and quay and...)

@lucab
Copy link
Contributor

lucab commented May 26, 2020

Ah, I missed the fact that RequiresInternetAccess fields are already in place. Then it looks like all relevant parties already agreed on this direction while plumbing coreos/mantle#967. I'll thus retire my comment above, sorry for the noise.

@cgwalters
Copy link
Member Author

/test sanity

var blacklisted bool
noPattern := hasString("*", patterns)
for name, t := range tests {
for _, flag := range t.Flags {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, shouldn't we reset noNetFiltered to false before this? Otherwise once it's set to true, it'll always remain true, no? Which means it would skip every test after a networking test when --no-net is specified.

Or better yet, just move the variable declaration into the for-loop. I guess we could do the same for blacklisted too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh great catch! I'd tested that we didn't run the network tests and that other tests were run, but not that we still listed all of the tests...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happened to still have this tab open and one thing I wanted to say here is that this is an excellent example of why we have code review - nothing would be checking today that we're silently not running some tests and it'd be painful to find later.

The Docker Hub recently decided that us pulling the `nginx` image
over and over many times a day was abusive and started rate limiting
us.

For OSTree's test suite at least, there's no good reason for us
to run podman's tests at all.  Similarly for e.g. Ignition.

And just as a general rule I think it's useful to cleanly separate
tests that can be run fully offline from those that require
Internet access.
@jlebon
Copy link
Member

jlebon commented May 26, 2020

/lgtm

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon, sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 4f2a1e2 into coreos:master May 26, 2020
@cgwalters
Copy link
Member Author

(There has also been parallel discussions with enhancing the Kubernetes and OpenShift test suites to support being run disconnected, so one can validate a disconnected environment - this is particularly tricky for OpenShift's tests which talk to Github and Docker Hub and quay and...)

Just to xref, openshift/origin#24887

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Jun 4, 2020
In coreos#1478 we
discussed Internet access and tests, adding a flag to disable
tests which are flagged as requiring it.

This flips things around and *enforces* no Internet access for
qemu tests by default unless the test is flagged as requiring
it.

Unsurprisingly it turns out several tests are missing the flag.
(And we also don't presently have a way to flag external tests
 as requiring Internet, which is another issue)
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Jul 2, 2020
In coreos#1478 we
discussed Internet access and tests, adding a flag to disable
tests which are flagged as requiring it.

This flips things around and *enforces* no Internet access for
qemu tests by default unless the test is flagged as requiring
it.

Unsurprisingly it turns out several tests are missing the flag.

Another issue here is that external tests don't have a way
to influcence flags.  Continuing our "mimic Debian autopkgtest where
applicable" trend, support a `needs-internet` tag, which works
for external tests since they can provide flags.
openshift-merge-robot pushed a commit that referenced this pull request Jul 8, 2020
In #1478 we
discussed Internet access and tests, adding a flag to disable
tests which are flagged as requiring it.

This flips things around and *enforces* no Internet access for
qemu tests by default unless the test is flagged as requiring
it.

Unsurprisingly it turns out several tests are missing the flag.

Another issue here is that external tests don't have a way
to influcence flags.  Continuing our "mimic Debian autopkgtest where
applicable" trend, support a `needs-internet` tag, which works
for external tests since they can provide flags.
dustymabe added a commit to dustymabe/coreos-assembler that referenced this pull request Oct 5, 2020
Sometime between podman 1.x and 2.x podman started putting
full 64 character IDs into the json output. Dynamically detect
the length of the ID and compare that number of characters.

We haven't noticed this test failing because we've been denylisting
the test in the pipeline: coreos#1478
openshift-merge-robot pushed a commit that referenced this pull request Oct 5, 2020
Sometime between podman 1.x and 2.x podman started putting
full 64 character IDs into the json output. Dynamically detect
the length of the ID and compare that number of characters.

We haven't noticed this test failing because we've been denylisting
the test in the pipeline: #1478
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants