Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add worker status and duration metrics in live and task reports #15180
Add worker status and duration metrics in live and task reports #15180
Changes from 3 commits
613023c
3a23d03
4804c2a
d6dc8d9
50a0eb7
2a8024b
ea490fd
23497f3
e98c4f6
7381596
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets mark these field final ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't - because of the default constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit is the () required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also document these properties in
docs/api-reference/sql-ingestion-api.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added these properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be okay to remove the -1 duration check and always report taskTracker.status.getDuration().
We always rely on the overlord system to gives us the task duration without changing anything.
wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was that duration is always
-1
forRUNNING
workers (since it gets updated only upon completion), so publishing their duration wouldn't ever be useful.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSQ tracks the start time as the time it requested to launch the job. I currently do not know if duration counter in the overlord is started as soon as the overlord gets the request or when ever that task goes into running state.
To test it what you could do is
Check if the taskDuration in the report is going backward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the check both from a run and in code and found it to be correct: the start time in MSQ is recorded when the task is submitted whereas in overlord is upon start of the run.
Currently
TaskStatus
doesn't have any field to record a startTime, so for MSQ to get a worker's startTime from the Overlord, this field needs to be added and persisted in the database upon the worker task's start.There is no such issue with reporting of query's duration periodically in live reports since it's a single hop from (from overlord to indexer) and the query's start time is maintained inside the controller -- so the duration can be calculated on-the-fly.
Since we are more interested in timings of finished workers, I think for now it is fine to just report the duration as
-1
for worker tasks instead of adding a new field inTaskStatus
which is used at multiple places. But do let me know if you think we should take the route of adding a new field.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging in. Reporting -1 for running tasks SGTM.