CI flake: Panic in Spec Teardown: send on closed channel #6518

edsantiago · 2020-06-08T16:56:38Z

cirrus-flake-xref is reporting three instances of this since May 26:

Panic in Spec Teardown (AfterEach) [2.677 seconds]
         Podman containers 
         /var/tmp/go/src/github.com/containers/libpod/pkg/bindings/test/containers_test.go:19
           podman wait to pause|unpause condition [AfterEach]
           /var/tmp/go/src/github.com/containers/libpod/pkg/bindings/test/containers_test.go:282
         
           Test Panicked
           send on closed channel
           /usr/lib/golang/src/runtime/panic.go:969
         
           Full Stack Trace
           panic(0x10397a0, 0x134e430)
           	/usr/lib/golang/src/runtime/panic.go:969 +0x166
           github.com/containers/libpod/pkg/bindings/test.glob..func5.20.2(0xc0000b2a38, 0xc00004dc90, 0xc000319ae8, 0xc000011180)
           	/var/tmp/go/src/github.com/containers/libpod/pkg/bindings/test/containers_test.go:308 +0xc8
           created by github.com/containers/libpod/pkg/bindings/test.glob..func5.20
           	/var/tmp/go/src/github.com/containers/libpod/pkg/bindings/test/containers_test.go:304 +0x6cc

Log links:

Link to containers_test.go:308.

All failures have been in fedora-32 special_testing_bindings.

The text was updated successfully, but these errors were encountered:

edsantiago · 2020-06-24T22:00:46Z

Still happening, and (as far as my logs can tell) still only on f32:

Podman containers [AfterEach] podman wait to pause|unpause condition

fedora-32 : special_testing_bindings

github-actions · 2020-07-25T00:12:01Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-07-25T11:20:55Z

@edsantiago Still an issue?

edsantiago · 2020-07-25T12:09:23Z

Yes, at least as of three days ago.

Podman containers [AfterEach] podman wait to pause|unpause condition

fedora-32 : special_testing_bindings

edsantiago · 2020-07-25T19:28:48Z

And another one just now, on my own PR #7070

edsantiago · 2020-07-28T11:48:24Z

Another one yesterday:

fedora-32 : special_testing_bindings
- PR API events: fix parsing error #7088
  - 07-27 06:11

Still no sign of it on anything other than f32

The "podman wait to pause|unpause condition" test is failing several times a day, always a flake. Issue containers#6518. Disable it until the cause can be identified and fixed. Signed-off-by: Ed Santiago <santiago@redhat.com>

github-actions · 2020-08-28T00:16:15Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-08-28T14:18:16Z

@edsantiago Still an issue?

edsantiago · 2020-08-31T12:06:43Z

Uh, well, no, because it was flaking so much that I disabled the test (#7143). I haven't tried to generate a reproducer, but if today is calm I will try to do so. I've removed the stale-issue label because until the test is reenabled, we don't know.

Reference: containers#6518, a very-frequently-flaking CI test, disabled a month ago (containers#7143) because it was triggering so often in CI. Unfortunately, that seems to have simply swept the problem under the rug. AFAICT nobody has bothered to look at the root bug, so let's just reenable. If the problem persists, I'll let annoyed developers squeaky-wheel 6158 so there's some incentive to fix it. If the problem has miraculously gone away in the last month, that's a win too. (This test failure does not reproduce on my laptop, nor does it lend itself to devising a simple reproducer on a test VM.) Also: since containers#5325 appears to have been closed as fixed, remove a 'Skip' that references it. Unfortunately this also requires removing a lot of other cruft. This was an incidental oh-by-the-way addition that I thought would be trivial but ended up causing a much larger diff. Signed-off-by: Ed Santiago <santiago@redhat.com>

edsantiago · 2020-09-08T17:38:58Z

Yes, still happening, in post-merge testing on master:

[+0517s] •! Panic in Spec Teardown (AfterEach) [2.851 seconds]
         Podman containers 
         /var/tmp/go/src/github.com/containers/podman/pkg/bindings/test/containers_test.go:17
           podman wait to pause|unpause condition [AfterEach]
           /var/tmp/go/src/github.com/containers/podman/pkg/bindings/test/containers_test.go:272
         
           Test Panicked
           send on closed channel
           /usr/lib/golang/src/runtime/panic.go:969
         
           Full Stack Trace
           panic(0x104b380, 0x13644d0)
           	/usr/lib/golang/src/runtime/panic.go:969 +0x166
           github.com/containers/podman/v2/pkg/bindings/test.glob..func5.20.2(0xc000010200, 0xc0001d7640, 0xc000c7eba8, 0xc0010b8520)
           	/var/tmp/go/src/github.com/containers/podman/pkg/bindings/test/containers_test.go:298 +0xc8
           created by github.com/containers/podman/v2/pkg/bindings/test.glob..func5.20
           	/var/tmp/go/src/github.com/containers/podman/pkg/bindings/test/containers_test.go:294 +0x6cc

Links, from most- to least-specific:

Reference: containers#6518, a very-frequently-flaking CI test, disabled a month ago (containers#7143) because it was triggering so often in CI. Unfortunately, that seems to have simply swept the problem under the rug. AFAICT nobody has bothered to look at the root bug, so let's just reenable. If the problem persists, I'll let annoyed developers squeaky-wheel 6158 so there's some incentive to fix it. If the problem has miraculously gone away in the last month, that's a win too. (This test failure does not reproduce on my laptop, nor does it lend itself to devising a simple reproducer on a test VM.) Also: since containers#5325 appears to have been closed as fixed, remove a 'Skip' that references it. Unfortunately this also requires removing a lot of other cruft. This was an incidental oh-by-the-way addition that I thought would be trivial but ended up causing a much larger diff. Signed-off-by: Ed Santiago <santiago@redhat.com>

edsantiago · 2020-09-23T18:09:25Z

This is still happening. Recent failures:

github-actions · 2020-11-08T00:14:44Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-11-10T21:37:34Z

@edsantiago still seeing this?

edsantiago · 2020-11-10T21:38:47Z

Sorry, I haven't had time to look (at this nor the other stale-issue that you haven't pinged me about yet). Won't have time until Thursday most likely. But I will, I promise.

edsantiago · 2020-11-11T19:12:59Z

Yes, still happening. I'll skip the September instances, and just list October/November:

fedora : Test Bindings
- PR Don't error if resolv.conf does not exists #8111
  - 10-22 13:52
- PR Retrieve network inspect info from dependency container #8075
  - 10-20 13:48
gce_instance:fedora : Test Bindings
- PR fedora rootless cpu settings #8231
  - 11-03 10:46

Note to self: the search term for cirrus-flake-summarize is "unpause"

It's continuing to flake, and I see no activity on containers#6518. Flakes are evil. Let's just disable the test again, until someone takes the initiative to fix the bug. Signed-off-by: Ed Santiago <santiago@redhat.com>

edsantiago · 2020-12-01T16:42:05Z

Still flaking. I've filed #8536 to re-disable the offending test.

rhatdan · 2020-12-01T18:43:14Z

@edsantiago What do you think of this patch. I think the issue is the second errchan is setup before the first one completes. Causing the issue you are seeing. If we separate the channels, we should not be closing the channel before it is used.

diff.txt

edsantiago · 2020-12-01T18:51:32Z

I don't see how there could be a race here -- the code looks really sequential to me. But Go has subtleties I don't understand. I'm willing and even eager to give your approach a try for a few months, if you'd like to submit that! I will close this once your PR goes through CI and merges. Thank you!

rhatdan · 2020-12-01T19:06:33Z

I am wondering if threads would skip over the wait error channel, but you may be right. Only way I could see this happening would be if the second err = make(chan error) could fire before the close(errChan) in the fist function happened.

The It("podman wait to pause|unpause condition"... test is flaking every so often when a messages is sent in the second function to a channel. It is my believe that in between the time the first function sends a message to the channel and before it closes the channel the second errChan=make() has happened. This would mean that the fist function closes the second errChan, and then when the second function sends a message to the second errChan, it fails and blows up with the error you are seeing. By creating a different variable for the second channel, we eliminate the race. Fixes: containers#6518 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

mheon added the flakes Flakes from Continuous Integration label Jun 8, 2020

github-actions bot added the stale-issue label Jul 25, 2020

edsantiago mentioned this issue Jul 28, 2020

Fix building from http or '-' options #7121

Merged

edsantiago removed the stale-issue label Jul 28, 2020

edsantiago mentioned this issue Jul 29, 2020

bindings: skip flaky pause/unpause test #7143

Merged

github-actions bot added the stale-issue label Aug 28, 2020

edsantiago removed the stale-issue label Aug 31, 2020

edsantiago mentioned this issue Aug 31, 2020

bindings: reenable flaky(?) pause/unpause test #7514

Merged

edsantiago mentioned this issue Sep 23, 2020

Remove final v2remotefail failures #7731

Merged

rhatdan added kind/bug Categorizes issue or PR as related to a bug. kind/test-flake Categorizes issue or PR as related to test flakes. labels Oct 7, 2020

github-actions bot added the stale-issue label Nov 8, 2020

edsantiago removed the stale-issue label Nov 11, 2020

edsantiago mentioned this issue Dec 1, 2020

bindings tests: re-disable flaky pause|unpause test #8536

Closed

rhatdan mentioned this issue Dec 1, 2020

Fix potential race condition in testing #8541

Merged

openshift-merge-robot closed this as completed in #8541 Dec 1, 2020

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI flake: Panic in Spec Teardown: send on closed channel #6518

CI flake: Panic in Spec Teardown: send on closed channel #6518

edsantiago commented Jun 8, 2020

edsantiago commented Jun 24, 2020

github-actions bot commented Jul 25, 2020

rhatdan commented Jul 25, 2020

edsantiago commented Jul 25, 2020

edsantiago commented Jul 25, 2020

edsantiago commented Jul 28, 2020

github-actions bot commented Aug 28, 2020

rhatdan commented Aug 28, 2020

edsantiago commented Aug 31, 2020

edsantiago commented Sep 8, 2020

edsantiago commented Sep 23, 2020

github-actions bot commented Nov 8, 2020

rhatdan commented Nov 10, 2020

edsantiago commented Nov 10, 2020

edsantiago commented Nov 11, 2020

edsantiago commented Dec 1, 2020

rhatdan commented Dec 1, 2020

edsantiago commented Dec 1, 2020

rhatdan commented Dec 1, 2020

CI flake: Panic in Spec Teardown: send on closed channel #6518

CI flake: Panic in Spec Teardown: send on closed channel #6518

Comments

edsantiago commented Jun 8, 2020

edsantiago commented Jun 24, 2020

github-actions bot commented Jul 25, 2020

rhatdan commented Jul 25, 2020

edsantiago commented Jul 25, 2020

edsantiago commented Jul 25, 2020

edsantiago commented Jul 28, 2020

github-actions bot commented Aug 28, 2020

rhatdan commented Aug 28, 2020

edsantiago commented Aug 31, 2020

edsantiago commented Sep 8, 2020

edsantiago commented Sep 23, 2020

github-actions bot commented Nov 8, 2020

rhatdan commented Nov 10, 2020

edsantiago commented Nov 10, 2020

edsantiago commented Nov 11, 2020

edsantiago commented Dec 1, 2020

rhatdan commented Dec 1, 2020

edsantiago commented Dec 1, 2020

rhatdan commented Dec 1, 2020