Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: re-implement status update flags #1514

Merged
merged 3 commits into from
Jul 8, 2021
Merged

fix: re-implement status update flags #1514

merged 3 commits into from
Jul 8, 2021

Conversation

shaneutt
Copy link
Contributor

@shaneutt shaneutt commented Jul 8, 2021

What this PR does / why we need it:

This patch re-implements the --status-update flag for KIC 2.0
but also deprecates the --update-status-on-shutdown flag as the
behavior of tearing down on cleanup is somewhat in conflict with
the re-entrant and eventually consistent design of KIC 2.0.

Which issue this PR fixes

fixes #1304

PR Readiness Checklist:

  • the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

@shaneutt shaneutt added priority/low area/debt area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. labels Jul 8, 2021
@shaneutt shaneutt self-assigned this Jul 8, 2021
@shaneutt shaneutt requested a review from a team as a code owner July 8, 2021 14:55
@shaneutt shaneutt linked an issue Jul 8, 2021 that may be closed by this pull request
@shaneutt shaneutt temporarily deployed to Configure ci July 8, 2021 14:55 Inactive
@codecov
Copy link

codecov bot commented Jul 8, 2021

Codecov Report

Merging #1514 (efb2d08) into next (cb12ad9) will increase coverage by 0.02%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             next    #1514      +/-   ##
==========================================
+ Coverage   51.46%   51.48%   +0.02%     
==========================================
  Files          91       91              
  Lines        6309     6316       +7     
==========================================
+ Hits         3247     3252       +5     
- Misses       2770     2771       +1     
- Partials      292      293       +1     
Flag Coverage Δ
integration-test 48.37% <75.00%> (-0.07%) ⬇️
unit-test 38.93% <37.50%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
railgun/manager/run.go 69.92% <60.00%> (-0.30%) ⬇️
railgun/pkg/config/config.go 93.33% <100.00%> (+0.22%) ⬆️
railgun/internal/ctrlutils/ingress-status.go 61.63% <0.00%> (-1.30%) ⬇️
...trollers/configuration/zz_generated_controllers.go 48.41% <0.00%> (ø)
pkg/parser/parser.go 84.51% <0.00%> (+1.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb12ad9...efb2d08. Read the comment docs.

@rainest
Copy link
Contributor

rainest commented Jul 8, 2021

What are the other methods mentioned in the issue comment? I'd naively expect that we can't be eventually consistent when shutting down, because there's no "eventually" after shutdown: the controller is gone and will not be able to issue further status updates.

@shaneutt
Copy link
Contributor Author

shaneutt commented Jul 8, 2021

What are the other methods mentioned in the issue comment? I'd naively expect that we can't be eventually consistent when shutting down, because there's no "eventually" after shutdown: the controller is gone and will not be able to issue further status updates.

The "other methods" refers to the assumption that the controller will be restarted, and that the re-entrant nature of our logic leads to eventual consistency. Trying to support teardown mechanisms with a Kubernetes controller seems like a juxtaposition and without some strong influence I'm not seeing why we would start adding this kind of functionality right now. Let me know your thoughts.

@shaneutt shaneutt temporarily deployed to Configure ci July 8, 2021 16:08 Inactive
@shaneutt shaneutt temporarily deployed to Configure ci July 8, 2021 16:08 Inactive
@rainest
Copy link
Contributor

rainest commented Jul 8, 2021

Ah. I was thinking of the case where the controller+proxy are deleted entirely, possibly never to return, which is the only case where the current shutdown handler does anything.

This is maybe not hugely important--if you're keeping Ingresses around and care about their status it stands to reason that you're probably going to start a controller again in the future--but we can leave inaccurate information around without it (status info will remain indefinitely otherwise). This could potentially impact some setup where another application watches Ingress status to update DNS and either populates a fallback address if there is none or there are downstream applications that behave differently for an empty NOERROR response versus an HTTP timeout or refusal. Probably not that common, but not unreasonable.

Research doesn't turn up much of interest to say whether or not we definitely need this, unfortunately. It's another carryover from the NGINX controller, and the only info I can find there indicates that shutdown updates have been around since time immemorial. The flag was added to instead disable the behavior, since there are some cases where you don't want it--prior to that the controller always did shutdown updates.

This patch re-implements the --status-update flag for KIC 2.0
but also deprecates the --update-status-on-shutdown flag as the
behavior of tearing down on cleanup is somewhat in conflict with
the idempontent and eventually consistent design of KIC 2.0.
@shaneutt shaneutt temporarily deployed to Configure ci July 8, 2021 19:27 Inactive
@shaneutt shaneutt temporarily deployed to Configure ci July 8, 2021 19:27 Inactive
@shaneutt
Copy link
Contributor Author

shaneutt commented Jul 8, 2021

Research doesn't turn up much of interest to say whether or not we definitely need this, unfortunately. It's another carryover from the NGINX controller, and the only info I can find there indicates that shutdown updates have been around since time immemorial. The flag was added to instead disable the behavior, since there are some cases where you don't want it--prior to that the controller always did shutdown updates.

Yes, I've seen kubernetes/ingress-nginx#881 and while this does provide some justification I still feel like this functionality is dubious: at best it's "best effort" functionality, as the pod may be killed without any grace period and the status cleanup can't run. And then amidst what is effectively "best effort" the following things need to be true for there to be value:

  • the status updates need to be able to complete before the configured container shutdown grace period
  • the statuses of the ingress records is being used by an external source that specifically needs to know when the IP/Host becomes unavailable

However for that second point isn't that potentially a bug as well? If your controller gets stopped and then it removes the IP/Host status from the resource but then the controller comes back online and puts it back doesn't that have the potential to break such integrations, especially in the case that the proxy is still up during this time and the IP is actually valid? Why would we ever make the control plane update the status of an object which we can't validate is actually a valid representation of the data plane? Wouldn't we only want to remove the status if we were certain that the backend/dataplane was actually not serving it?

We have an opportunity here to cut ties with this bit of functionality which is arcane, if not just questionable because we're about to release a new major version and we can let the users know the change occurred, I think we should take that to reduce maintenance as I'm not seeing strong justification otherwise.

Let me know what you think. Ultimately if we really want to make this work I believe we only need to add a purge mechanism to the Proxy cache implementation.

@shaneutt shaneutt requested a review from rainest July 8, 2021 19:35
@shaneutt shaneutt temporarily deployed to Configure ci July 8, 2021 20:27 Inactive
@shaneutt shaneutt temporarily deployed to Configure ci July 8, 2021 20:27 Inactive
Copy link
Contributor

@rainest rainest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absent a clear genesis for the feature, I can only think of hypothetical cases where you'd need it.

Between the narrow uses I can think of and your valid points about the grace period possibly breaking it even when implemented, I think we can reasonably not implement it, see if complaints come in, and add it back if needed.

--update-status implementation looks fine.

@rainest rainest temporarily deployed to Configure ci July 8, 2021 21:34 Inactive
@rainest rainest temporarily deployed to Configure ci July 8, 2021 21:34 Inactive
@rainest rainest merged commit 4125d78 into next Jul 8, 2021
@rainest rainest deleted the issue-1304 branch July 8, 2021 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/debt area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. ci/license/unchanged priority/low
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KIC 2.0: handle update-status and update-status-on-shutdown flags
2 participants