Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container kill: handle stopped/exited container #17126

Merged
merged 2 commits into from
Jan 16, 2023

Conversation

vrothberg
Copy link
Member

The container lock is released before stopping/killing which implies
certain race conditions with, for instance, the cleanup process changing
the container state to stopped, exited or other states.

The (remaining) flakes seen in #16142 and #15367 strongly indicate a
race in between the stopping/killing a container and the cleanup
process. To fix the flake make sure to ignore invalid-state errors.
An alternative fix would be to change KillContainer to not return such
errors at all but commit c77691f indicates an explicit desire to
have these errors being reported in the sig proxy.

[NO NEW TESTS NEEDED] as it's a race already covered by the system
tests.

Fixes: #16142
Fixes: #15367
Signed-off-by: Valentin Rothberg vrothberg@redhat.com

Does this PR introduce a user-facing change?

Fix a race condition when stopping/killing a container that has already been stopped or has exited.

@mheon PTAL
@edsantiago I am convinced this will fix both linked flakes.

Every time I look at a container-removal issue I wonder why the
container isn't locked directly here, so let's add a comment here.
I am not sure whether I would be better if callers took care of
locking but for now the comment will safe the future me and probably
other readers some time.

[NO NEW TESTS NEEDED]

Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
The container lock is released before stopping/killing which implies
certain race conditions with, for instance, the cleanup process changing
the container state to stopped, exited or other states.

The (remaining) flakes seen in containers#16142 and containers#15367 strongly indicate a
race in between the stopping/killing a container and the cleanup
process.  To fix the flake make sure to ignore invalid-state errors.
An alternative fix would be to change `KillContainer` to not return such
errors at all but commit c77691f indicates an explicit desire to
have these errors being reported in the sig proxy.

[NO NEW TESTS NEEDED] as it's a race already covered by the system
tests.

Fixes: containers#16142
Fixes: containers#15367
Signed-off-by: Valentin Rothberg <vrothberg@redhat.com>
@openshift-ci openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jan 16, 2023
@vrothberg vrothberg marked this pull request as ready for review January 16, 2023 12:58
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 16, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 16, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2023
@mheon
Copy link
Member

mheon commented Jan 16, 2023

Code LGTM

@vrothberg
Copy link
Member Author

@containers/podman-maintainers PTAL

@Luap99
Copy link
Member

Luap99 commented Jan 16, 2023

/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 16, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 16, 2023
@vrothberg
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 16, 2023
@vrothberg
Copy link
Member Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 16, 2023
@vrothberg
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 16, 2023
@vrothberg
Copy link
Member Author

tide looks tired. Let's see. Maybe it'll get picked up later.

@openshift-merge-robot openshift-merge-robot merged commit f07cee3 into containers:main Jan 16, 2023
@vrothberg vrothberg deleted the fix-16142 branch January 17, 2023 08:38
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 15, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
4 participants