[Uptime] Use scripted metric for snapshot calculation (#58247) #58389

andrewvc · 2020-02-24T18:37:32Z

Forward port of #58247 to master

This is an improved version of #58078

Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later.

This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful.

We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added.

I attempted to keep memory usage relatively slow by using simple maps of strings.

Summary

Summarize your PR. If it involves visual changes include a screenshot or gif.

Checklist

Delete any items that are not applicable to this PR.

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
This was checked for keyboard-only and screenreader accessibility
This renders correctly on smaller devices using a responsive layout. (You can test this in your browser
This was checked for cross-browser compatibility, including a check against IE11

For maintainers

This was checked for breaking API changes and was labeled appropriately

Fixes elastic#58079 This is an improved version of elastic#58078 Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later. This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful. We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added. I attempted to keep memory usage relatively slow by using simple maps of strings.

elasticmachine · 2020-02-24T18:37:35Z

Pinging @elastic/uptime (Team:uptime)

kibanamachine · 2020-02-24T20:29:36Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: 420f3e5

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

justinkambic

Ran this and it seems ok, code differences look alright too.

LGTM

…elastic#58389) Fixes elastic#58079 This is an improved version of elastic#58078 Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later. This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful. We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added. I attempted to keep memory usage relatively slow by using simple maps of strings.

…re/files-and-filetree * 'master' of github.com:elastic/kibana: (174 commits) [SIEM] Fix unnecessary re-renders on the Overview page (elastic#56587) Don't mutate error message (elastic#58452) Fix service map popover transaction duration (elastic#58422) [ML] Adding filebeat config to file dataviz (elastic#58152) [Uptime] Improve refresh handling when generating test data (elastic#58285) [Logs / Metrics UI] Remove path prefix from ViewSourceConfigur… (elastic#58238) [ML] Functional tests - adjust classification model memory (elastic#58445) [ML] Use event.timezone instead of beat.timezone in file upload (elastic#58447) [Logs UI] Unskip and stabilitize log column configuration tests (elastic#58392) [Telemetry] Separate the license retrieval from the stats in the usage collectors (elastic#57332) hide welcome screen for cloud (elastic#58371) Move src/legacy/ui/public/notify/app_redirect to kibana_legacy (elastic#58127) [ML] Functional tests - stabilize typing during df analytics creation (elastic#58227) fix short url in spaces (elastic#58313) [SIEM] Upgrades cypress to version 4.0.2 (elastic#58400) [Index management] Move to new platform "plugins" folder (elastic#58109) [kbn/optimizer] disable parallelization in terser plugin (elastic#58396) [Uptime] Delete useless try...catch blocks (elastic#58263) [Uptime] Use scripted metric for snapshot calculation (elastic#58247) (elastic#58389) [APM] Stabilize agent configuration API (elastic#57767) ... # Conflicts: # src/plugins/console/public/application/containers/editor/legacy/console_editor/editor.tsx

…elastic#58389) (elastic#58415) Fixes elastic#58079 This is an improved version of elastic#58078 Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later. This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful. We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added. I attempted to keep memory usage relatively slow by using simple maps of strings. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

andrewvc added bug Fixes for quality problems that affect the customer experience backport Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability release_note:skip Skip the PR/issue when compiling release notes labels Feb 24, 2020

andrewvc requested a review from justinkambic February 24, 2020 18:37

andrewvc self-assigned this Feb 24, 2020

justinkambic approved these changes Feb 24, 2020

View reviewed changes

andrewvc merged commit 5eefdbb into elastic:master Feb 24, 2020

andrewvc deleted the master-scripted-metric-count branch February 24, 2020 22:47

andrewvc mentioned this pull request Feb 24, 2020

[7.x] [Uptime] Use scripted metric for snapshot calculation (#58247) (#58389) #58415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Uptime] Use scripted metric for snapshot calculation (#58247) #58389

[Uptime] Use scripted metric for snapshot calculation (#58247) #58389

andrewvc commented Feb 24, 2020 •

edited

Loading

elasticmachine commented Feb 24, 2020

kibanamachine commented Feb 24, 2020

justinkambic left a comment •

edited

Loading

[Uptime] Use scripted metric for snapshot calculation (#58247) #58389

[Uptime] Use scripted metric for snapshot calculation (#58247) #58389

Conversation

andrewvc commented Feb 24, 2020 • edited Loading

Summary

Checklist

For maintainers

elasticmachine commented Feb 24, 2020

kibanamachine commented Feb 24, 2020

💚 Build Succeeded

justinkambic left a comment • edited Loading

Choose a reason for hiding this comment

andrewvc commented Feb 24, 2020 •

edited

Loading

justinkambic left a comment •

edited

Loading