Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(slo): health status #181351

Merged
merged 24 commits into from
Apr 30, 2024
Merged

feat(slo): health status #181351

merged 24 commits into from
Apr 30, 2024

Conversation

kdelemme
Copy link
Contributor

@kdelemme kdelemme commented Apr 22, 2024

Resolves #176088

🍒 Summary

This PR implements a new internal routes for fetching the health and state of a list of slo id. The state can be one of the following options: no_data, indexing, running or stale.
While the health is directly correlated to the related transforms' health.

The state decision tree is as follow:

  • When summaryUpdatedAt > 48hours: state = "stale"
  • When summaryUpdatedAt - latestSliTimestamp >= 10 minutes: state = "indexing"
  • When summaryUpdatedAt - latestSliTimestamp < 10 minutes: state = "running"
  • state = "no_data" if temporary summary document

We display a warning on the SLO details page when one of the transform is unhealthy, asking the user to go investigate:

Page Screenshot
SLO Details image
SLO List Group View image
SLO List Ungroup View image

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@kdelemme kdelemme added release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-management Observability Management User Experience Team v8.15.0 labels Apr 22, 2024
@kdelemme kdelemme marked this pull request as ready for review April 22, 2024 21:45
@kdelemme kdelemme requested a review from a team as a code owner April 22, 2024 21:45
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@kdelemme kdelemme marked this pull request as draft April 23, 2024 18:10
@kdelemme
Copy link
Contributor Author

/ci

@kdelemme kdelemme marked this pull request as ready for review April 23, 2024 20:02
@kdelemme kdelemme marked this pull request as draft April 24, 2024 14:45
data-test-subj="sloHealthCalloutInspectTransformButton"
color="warning"
fill
href={http?.basePath.prepend('/app/management/data/transform')}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to pre-filter the transform list ? possibly not , though, i guess we can contribute to the transform list page so that it filters using query params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not supported at the moment :(

@kdelemme
Copy link
Contributor Author

/ci

@kdelemme kdelemme marked this pull request as ready for review April 24, 2024 16:13
kdelemme and others added 2 commits April 24, 2024 12:28
@kdelemme kdelemme requested a review from shahzad31 April 24, 2024 18:19
queryFn: async ({ signal }) => {
try {
const response = await http.post<FetchSLOHealthResponse>(
'/internal/observability/slos/_health',
Copy link
Contributor

@mgiota mgiota Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdelemme We were discussing with @lucabelluccini about enhancing the Kibana diagnostic tool to pull from this API. One question that was brought up was the fact that it is a POST request. We can inject the appropriate 'kbn-xsrf' and 'elastic-api-version: 1' headers for this to the diagnostic tool.

What I was wondering now is the list payload that is required for this endpoint. Looking at this file, I am wondering how would we pass the list of SLOs? Looks like all requests in kibana yml file are GET requests, no? Could we make the list prop optional and if not passed, then it accepts all SLOs by default? What are your thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to limit to the provided list because fetching all SLOs at once would be too much in some case. Same way we don't return all SLOs in the find API.

But what we can do instead, is use directly the transform stats endpoint from this diagnostic tool with the slo-* id, this will return in one request all the SLO transform stats. (I think there is still a limit on this API, like 1000)
Then the diagnostic tool could filter & transform the result to keep only the health.status part of it, or return the payload as is.

This API is already available and GET.

@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label Apr 30, 2024
Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !!

@kibana-ci
Copy link
Collaborator

kibana-ci commented Apr 30, 2024

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #63 / console app "after all" hook: afterTestSuite.trigger in "console app"
  • [job] [logs] FTR Configs #63 / console app Console variables "after all" hook: afterTestSuite.trigger for "should allow removing a variable"
  • [job] [logs] FTR Configs #63 / console app Console variables "before all" hook for "should allow creating a new variable"
  • [job] [logs] FTR Configs #62 / Saved Objects Management saved objects management with hidden types API calls should flag the object as hidden in its meta

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
observability 510 512 +2
slo 740 745 +5
total +7

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/slo-schema 173 179 +6

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
slo 720.3KB 726.3KB +6.1KB

Canvas Sharable Runtime

The Canvas "shareable runtime" is an bundle produced to enable running Canvas workpads outside of Kibana. This bundle is included in third-party webpages that embed canvas and therefor should be as slim as possible.

id before after diff
module count - 5875 +5875
total size - 6.7MB +6.7MB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
observability 150.8KB 151.1KB +370.0B
slo 22.3KB 22.4KB +126.0B
total +496.0B
Unknown metric groups

API count

id before after diff
@kbn/slo-schema 173 179 +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@kdelemme kdelemme enabled auto-merge (squash) April 30, 2024 14:24
@kdelemme kdelemme merged commit 06d32af into elastic:main Apr 30, 2024
20 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Apr 30, 2024
@kdelemme kdelemme deleted the slo/health-status branch April 30, 2024 14:37
yuliacech pushed a commit to yuliacech/kibana that referenced this pull request May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:project-deploy-observability Create an Observability project release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-management Observability Management User Experience Team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SLO] Add health status in SLO list and details page
7 participants