Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexer task success and failure metrics #16829

Merged

Conversation

rbankar7
Copy link
Contributor

@rbankar7 rbankar7 commented Aug 1, 2024

Adds indexer level task success and failure metrics

Description

This PR adds indexer-level task metrics-

"indexer/task/failed/count"
"indexer/task/success/count"

the current "worker/task/completed/count" metric shows all the tasks completed irrespective of success or failure status so these metrics would help us get more visibility into the status of the completed tasks

Release note

Adds indexer/task/failed/count and indexer/task/failed/count metrics


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have already been discussed, but if the purpose is only to track failed and succeeded tasks, we already have task/running/count, task/success/count, task/failed/count metrics emitted by the Overlord. Are those not sufficient?

https://druid.apache.org/docs/latest/operations/metrics#indexing-service

@rbankar7
Copy link
Contributor Author

rbankar7 commented Aug 3, 2024

This might have already been discussed, but if the purpose is only to track failed and succeeded tasks, we already have task/running/count, task/success/count, task/failed/count metrics emitted by the Overlord. Are those not sufficient?

https://druid.apache.org/docs/latest/operations/metrics#indexing-service

We wanted to have this metric for each indexer node. The current metric doesn't provide per-host information
Also we are already emitting completed task metric from each indexer so this is just adding more refinement to that information

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise looks good to me.

docs/operations/metrics.md Outdated Show resolved Hide resolved
docs/operations/metrics.md Outdated Show resolved Hide resolved
docs/operations/metrics.md Outdated Show resolved Hide resolved
@rbankar7 rbankar7 requested a review from kfaraz August 5, 2024 06:40
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@rbankar7
Copy link
Contributor Author

rbankar7 commented Aug 5, 2024

@kfaraz looks like the checks passed
could you please help me merge the PR?
Thanks for the review!

@kfaraz
Copy link
Contributor

kfaraz commented Aug 5, 2024

Sure, @rbankar7 !
@abhishekagarwal87 , do you have any further feedback on this PR?

@abhishekagarwal87 abhishekagarwal87 merged commit c8323d1 into apache:master Aug 5, 2024
88 checks passed
@abhishekagarwal87
Copy link
Contributor

Just merged. Thank you for the contribution @rbankar7

@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants