Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix exposing inconsistency in job status outside of persistence API #1152

Merged
merged 2 commits into from
Feb 9, 2022

Conversation

tgianos
Copy link
Contributor

@tgianos tgianos commented Feb 9, 2022

Within job launch logic there was a helper method which would query the job status and based on the returned value proceed with some logic to either update it or fall back to other logic. This works ok if all requests to the persistence service implementation go to a single cosistent backend. If, however, read only queries go to a read replica which may have lag or some other implementation entirely this breaks down without the service actually knowing why or how.

Moving the logic for this behind the persistence API and letting the launch service only act the returned job status from the source of truth api should fix this problem.

@tgianos tgianos added the bug label Feb 9, 2022
@tgianos tgianos added this to the 4.0.7 milestone Feb 9, 2022
@tgianos tgianos requested a review from nvhoang February 9, 2022 00:05
@tgianos tgianos self-assigned this Feb 9, 2022
@tgianos tgianos modified the milestones: 4.0.7, 4.1.4 Feb 9, 2022
Was pulling in 5.0.0-rc.1 which required java 11
Within job launch logic there was a helper method which would query the job status and based on the returned value proceed with some logic to either update it or fall back to other logic. This works ok if all requests to the persistence service implementation go to a single cosistent backend. If, however, read only queries go to a read replica which may have lag or some other implementation entirely this breaks down without the service actually knowing why or how.

Moving the logic for this behind the persistence API and letting the launch service only act the returned job status from the source of truth api should fix this problem.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 93.771% when pulling 3d16538 on tgianos:fixLaunchStatusRace into e558009 on Netflix:4.1.x.

Copy link
Contributor

@nvhoang nvhoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix @tgianos !

@tgianos tgianos merged commit ea261c8 into Netflix:4.1.x Feb 9, 2022
@tgianos tgianos deleted the fixLaunchStatusRace branch February 9, 2022 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants