fix: re-implement status update flags #1514

shaneutt · 2021-07-08T14:55:22Z

What this PR does / why we need it:

This patch re-implements the --status-update flag for KIC 2.0
but also deprecates the --update-status-on-shutdown flag as the
behavior of tearing down on cleanup is somewhat in conflict with
the re-entrant and eventually consistent design of KIC 2.0.

Which issue this PR fixes

fixes #1304

PR Readiness Checklist:

the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

codecov · 2021-07-08T14:57:51Z

Codecov Report

Merging #1514 (efb2d08) into next (cb12ad9) will increase coverage by 0.02%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##             next    #1514      +/-   ##
==========================================
+ Coverage   51.46%   51.48%   +0.02%     
==========================================
  Files          91       91              
  Lines        6309     6316       +7     
==========================================
+ Hits         3247     3252       +5     
- Misses       2770     2771       +1     
- Partials      292      293       +1

Flag	Coverage Δ
integration-test	`48.37% <75.00%> (-0.07%)`	⬇️
unit-test	`38.93% <37.50%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
railgun/manager/run.go	`69.92% <60.00%> (-0.30%)`	⬇️
railgun/pkg/config/config.go	`93.33% <100.00%> (+0.22%)`	⬆️
railgun/internal/ctrlutils/ingress-status.go	`61.63% <0.00%> (-1.30%)`	⬇️
...trollers/configuration/zz_generated_controllers.go	`48.41% <0.00%> (ø)`
pkg/parser/parser.go	`84.51% <0.00%> (+1.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb12ad9...efb2d08. Read the comment docs.

rainest · 2021-07-08T15:29:08Z

What are the other methods mentioned in the issue comment? I'd naively expect that we can't be eventually consistent when shutting down, because there's no "eventually" after shutdown: the controller is gone and will not be able to issue further status updates.

shaneutt · 2021-07-08T16:06:54Z

What are the other methods mentioned in the issue comment? I'd naively expect that we can't be eventually consistent when shutting down, because there's no "eventually" after shutdown: the controller is gone and will not be able to issue further status updates.

The "other methods" refers to the assumption that the controller will be restarted, and that the re-entrant nature of our logic leads to eventual consistency. Trying to support teardown mechanisms with a Kubernetes controller seems like a juxtaposition and without some strong influence I'm not seeing why we would start adding this kind of functionality right now. Let me know your thoughts.

rainest · 2021-07-08T16:44:20Z

Ah. I was thinking of the case where the controller+proxy are deleted entirely, possibly never to return, which is the only case where the current shutdown handler does anything.

This is maybe not hugely important--if you're keeping Ingresses around and care about their status it stands to reason that you're probably going to start a controller again in the future--but we can leave inaccurate information around without it (status info will remain indefinitely otherwise). This could potentially impact some setup where another application watches Ingress status to update DNS and either populates a fallback address if there is none or there are downstream applications that behave differently for an empty NOERROR response versus an HTTP timeout or refusal. Probably not that common, but not unreasonable.

Research doesn't turn up much of interest to say whether or not we definitely need this, unfortunately. It's another carryover from the NGINX controller, and the only info I can find there indicates that shutdown updates have been around since time immemorial. The flag was added to instead disable the behavior, since there are some cases where you don't want it--prior to that the controller always did shutdown updates.

This patch re-implements the --status-update flag for KIC 2.0 but also deprecates the --update-status-on-shutdown flag as the behavior of tearing down on cleanup is somewhat in conflict with the idempontent and eventually consistent design of KIC 2.0.

shaneutt · 2021-07-08T19:35:18Z

Research doesn't turn up much of interest to say whether or not we definitely need this, unfortunately. It's another carryover from the NGINX controller, and the only info I can find there indicates that shutdown updates have been around since time immemorial. The flag was added to instead disable the behavior, since there are some cases where you don't want it--prior to that the controller always did shutdown updates.

Yes, I've seen kubernetes/ingress-nginx#881 and while this does provide some justification I still feel like this functionality is dubious: at best it's "best effort" functionality, as the pod may be killed without any grace period and the status cleanup can't run. And then amidst what is effectively "best effort" the following things need to be true for there to be value:

the status updates need to be able to complete before the configured container shutdown grace period
the statuses of the ingress records is being used by an external source that specifically needs to know when the IP/Host becomes unavailable

However for that second point isn't that potentially a bug as well? If your controller gets stopped and then it removes the IP/Host status from the resource but then the controller comes back online and puts it back doesn't that have the potential to break such integrations, especially in the case that the proxy is still up during this time and the IP is actually valid? Why would we ever make the control plane update the status of an object which we can't validate is actually a valid representation of the data plane? Wouldn't we only want to remove the status if we were certain that the backend/dataplane was actually not serving it?

We have an opportunity here to cut ties with this bit of functionality which is arcane, if not just questionable because we're about to release a new major version and we can let the users know the change occurred, I think we should take that to reduce maintenance as I'm not seeing strong justification otherwise.

Let me know what you think. Ultimately if we really want to make this work I believe we only need to add a purge mechanism to the Proxy cache implementation.

rainest

Absent a clear genesis for the feature, I can only think of hypothetical cases where you'd need it.

Between the narrow uses I can think of and your valid points about the grace period possibly breaking it even when implemented, I think we can reasonably not implement it, see if complaints come in, and add it back if needed.

--update-status implementation looks fine.

shaneutt added priority/low area/debt area/maintenance Cleanup, refactoring, and other maintenance improvements that don't change functionality. labels Jul 8, 2021

shaneutt added this to the Blockers for cutting KIC 2.0-beta.1 milestone Jul 8, 2021

shaneutt self-assigned this Jul 8, 2021

shaneutt requested a review from a team as a code owner July 8, 2021 14:55

shaneutt linked an issue Jul 8, 2021 that may be closed by this pull request

KIC 2.0: handle update-status and update-status-on-shutdown flags #1304

Closed

shaneutt temporarily deployed to Configure ci July 8, 2021 14:55 Inactive

github-actions bot added the ci/license/unchanged label Jul 8, 2021

shaneutt temporarily deployed to Configure ci July 8, 2021 16:08 Inactive

shaneutt force-pushed the issue-1304 branch from 8edffe0 to 5bd076c Compare July 8, 2021 19:27

shaneutt temporarily deployed to Configure ci July 8, 2021 19:27 Inactive

shaneutt requested a review from rainest July 8, 2021 19:35

Merge branch 'next' into issue-1304

89ad689

shaneutt temporarily deployed to Configure ci July 8, 2021 20:27 Inactive

rainest approved these changes Jul 8, 2021

View reviewed changes

Merge branch 'next' into issue-1304

efb2d08

rainest temporarily deployed to Configure ci July 8, 2021 21:34 Inactive

rainest merged commit 4125d78 into next Jul 8, 2021

rainest deleted the issue-1304 branch July 8, 2021 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: re-implement status update flags #1514

fix: re-implement status update flags #1514

shaneutt commented Jul 8, 2021 •

edited

Loading

codecov bot commented Jul 8, 2021 •

edited

Loading

rainest commented Jul 8, 2021

shaneutt commented Jul 8, 2021

rainest commented Jul 8, 2021

shaneutt commented Jul 8, 2021

rainest left a comment

fix: re-implement status update flags #1514

fix: re-implement status update flags #1514

Conversation

shaneutt commented Jul 8, 2021 • edited Loading

codecov bot commented Jul 8, 2021 • edited Loading

Codecov Report

rainest commented Jul 8, 2021

shaneutt commented Jul 8, 2021

rainest commented Jul 8, 2021

shaneutt commented Jul 8, 2021

rainest left a comment

Choose a reason for hiding this comment

shaneutt commented Jul 8, 2021 •

edited

Loading

codecov bot commented Jul 8, 2021 •

edited

Loading