Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Uptime] Use scripted metric for snapshot calculation (#58247) #58389

Merged
merged 1 commit into from
Feb 24, 2020

Conversation

andrewvc
Copy link
Contributor

@andrewvc andrewvc commented Feb 24, 2020

Fixes #58079

Forward port of #58247 to master

This is an improved version of #58078

Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later.

This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful.

We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added.

I attempted to keep memory usage relatively slow by using simple maps of strings.

Summary

Summarize your PR. If it involves visual changes include a screenshot or gif.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

Fixes elastic#58079

This is an improved version of elastic#58078

Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later.

This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful.

We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added.

I attempted to keep memory usage relatively slow by using simple maps of strings.
@andrewvc andrewvc added bug Fixes for quality problems that affect the customer experience backport Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability release_note:skip Skip the PR/issue when compiling release notes labels Feb 24, 2020
@andrewvc andrewvc self-assigned this Feb 24, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/uptime (Team:uptime)

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@justinkambic justinkambic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran this and it seems ok, code differences look alright too.

LGTM

@andrewvc andrewvc merged commit 5eefdbb into elastic:master Feb 24, 2020
@andrewvc andrewvc deleted the master-scripted-metric-count branch February 24, 2020 22:47
andrewvc added a commit to andrewvc/kibana that referenced this pull request Feb 24, 2020
…elastic#58389)

Fixes elastic#58079

This is an improved version of elastic#58078

Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later.

This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful.

We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added.

I attempted to keep memory usage relatively slow by using simple maps of strings.
jloleysens added a commit to jloleysens/kibana that referenced this pull request Feb 25, 2020
…re/files-and-filetree

* 'master' of github.com:elastic/kibana: (174 commits)
  [SIEM] Fix unnecessary re-renders on the Overview page (elastic#56587)
  Don't mutate error message (elastic#58452)
  Fix service map popover transaction duration (elastic#58422)
  [ML] Adding filebeat config to file dataviz (elastic#58152)
  [Uptime] Improve refresh handling when generating test data (elastic#58285)
  [Logs / Metrics UI] Remove path prefix from ViewSourceConfigur… (elastic#58238)
  [ML] Functional tests - adjust classification model memory (elastic#58445)
  [ML] Use event.timezone instead of beat.timezone in file upload (elastic#58447)
  [Logs UI] Unskip and stabilitize log column configuration tests (elastic#58392)
  [Telemetry] Separate the license retrieval from the stats in the usage collectors (elastic#57332)
  hide welcome screen for cloud (elastic#58371)
  Move src/legacy/ui/public/notify/app_redirect to kibana_legacy (elastic#58127)
  [ML] Functional tests - stabilize typing during df analytics creation (elastic#58227)
  fix short url in spaces (elastic#58313)
  [SIEM] Upgrades cypress to version 4.0.2 (elastic#58400)
  [Index management] Move to new platform "plugins" folder (elastic#58109)
  [kbn/optimizer] disable parallelization in terser plugin (elastic#58396)
  [Uptime] Delete useless try...catch blocks (elastic#58263)
  [Uptime] Use scripted metric for snapshot calculation (elastic#58247) (elastic#58389)
  [APM] Stabilize agent configuration API (elastic#57767)
  ...

# Conflicts:
#	src/plugins/console/public/application/containers/editor/legacy/console_editor/editor.tsx
elasticmachine added a commit to dhurley14/kibana that referenced this pull request Feb 25, 2020
…elastic#58389) (elastic#58415)

Fixes elastic#58079

This is an improved version of elastic#58078

Note, this is a bugfix targeting 7.6.1 . I've decided to open this PR directly against 7.6 in the interest of time. We can forward-port this to 7.x / master later.

This patch improves the handling of timespans with snapshot counts. This feature originally worked, but suffered a regression when we increased the default timespan in the query context to 5m. This means that without this patch the counts you get are the maximum total number of monitors that were down over the past 5m, which is not really that useful.

We now use a scripted metric to always count precisely the number of up/down monitors. On my box this could process 400k summary docs in ~600ms. This should scale as shards are added.

I attempted to keep memory usage relatively slow by using simple maps of strings.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport bug Fixes for quality problems that affect the customer experience release_note:skip Skip the PR/issue when compiling release notes Team:Uptime - DEPRECATED Synthetics & RUM sub-team of Application Observability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Uptime] Snapshot count lags behind actual monitor states
4 participants