[Lens] Formula: Define level of metric #94789

flash1293 · 2021-03-17T10:39:23Z

Right now all metrics specified in a formula are nested in all defined bucket dimensions of the current chart. In some cases however it's useful to work with a metric from a higher level of the aggregation tree. Overall metrics (#94597) can behave similar in some cases, but there are a few differences.

Implementation

For each metric, there could be a parameter of which buckets dimensions to apply (defaults to all of them): median(bytes, overallFor=reference("Top values geo.src")) - in a chart over time with a "break down by" dimension of top values of geo.src, this would give the median for each geo.src without applying the date histogram dimension.

On the implementation side this would require us to do multiple esaggs calls (for each combination of skipped/unskipped bucket aggs), then merge together the resulting table, joining in the higher level metrics.

This API would be more flexible than what Elasticsearch offers right now (you can define higher level metrics, but only in the order of the tree structure - this means you can't skip the root bucket agg, but keep the nested one).

Use case

I'm not sure whether we should offer this, because the differences to overall metrics are easy to confuse. For example, avg(bytes, overallFor=reference("Top values geo.src")) can also be written as

overall_sum(sum(bytes), group_by=reference("Top values geo.src")) / 
overall_sum(count(filter=bytes: *), group_by=reference("Top values geo.src"))

as long as "Top values geo.src" actually fetches all data ("Other" bucket included or size parameter high enough). However, it won't work for all of them (e.g. median)

Concerns

The main concern is stated above - would people be able to understand the difference between overall metrics calculated client side and metrics on different levels calculated on Elasticsearch side?

Also, if the API is implemented like defined here, it would be possible to get other "terms" buckets (because the top 5 terms change if some buckets within the hierarchy are skipped) - this would be a little confusing because for some buckets the overall metric would be missing (there's a similar issue with time offset)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-03-17T10:39:25Z

Pinging @elastic/kibana-app (Team:KibanaApp)

wylieconlon · 2021-03-22T22:56:32Z

I agree that this is a hard concept to explain to users, and I think you may have missed an important case: top level unbucketed metrics. This is similar to the dynamic thresholds discussion we've been having, where I could build a formula to hide any values that are below median. Something like:

if(last_value(bytes) < percentile(bytes, percentile=90, top_level=true), last_value(bytes), null): Only show the top 90+ percentile bytes

Here are some rough ideas for how we can clarify this concept for users. Each idea is separate:

To change the ES metric level, you have to use a named parameter. To calculate a column-oriented "overall" metric, you have to use a separate function. This makes the API look & feel different.
Instead of allowing overallFor, we could only allow 2 levels. "Outer" and "inner" for example, as in unbucketed vs inner metric
We could restrict the level-changing logic to a new type of UI that we haven't built yet. For example, what if every dimension is shown in a single "Formula view" that shows the hierarchy?

flash1293 · 2021-03-23T08:50:22Z

Agreed, top level metrics are important as well.

To change the ES metric level, you have to use a named parameter. To calculate a column-oriented "overall" metric, you have to use a separate function. This makes the API look & feel different.

That definitely makes sense and would be consistent with the rest of how formula works, I was just wondering whether it's enough. Maybe I'm overthinking

Instead of allowing overallFor, we could only allow 2 levels. "Outer" and "inner" for example, as in unbucketed vs inner metric

It's reducing the complexity while also reducing the expressiveness of formula, but maybe that's OK as a first step. Once people start using it we can see in which direction to evolve.

We could restrict the level-changing logic to a new type of UI that we haven't built yet. For example, what if every dimension is shown in a single "Formula view" that shows the hierarchy?

This seems like the most expensive option, but maybe it's the next logical step. I still hope we can avoid doing a mixed UI like this (and implement expression/sql datasources to cover these use cases instead)

flash1293 · 2021-10-28T17:34:17Z

Another important use case for this is array values - in some cases working with them summing up the rows using overall_sum is not the same as executing the metric “outside” of the current aggregation tree because of overlaps: #115770

ghudgins · 2021-11-01T22:18:39Z

I think this applies - https://discuss.elastic.co/t/how-to-get-the-total-of-memory-and-cpu-usage-of-my-cluster/288170

something like the below could solve what they're looking for (they want the overall sum but don't want to display the field from the group by to get it)

overall_sum(
    average(kubernetes.node.memory.capacity.bytes, groupby='kubernetes.node.name')
    )

flash1293 · 2021-11-02T16:04:25Z

It sounds more like #94619 which is closely related - the difference is that in "define level of metric" a single metric is not broken down, but the breakdown is still shown in the chart while in "Collapse bucket column" nothing is broken down (and not shown in the chart), except for the single metric which is summed up for display.

ghudgins · 2022-08-30T15:21:50Z

we plan on accounting for this requirement in our query system (and its ability to query and transform data) - #126095

Closing this issue as formula is not the intended solution

flash1293 added discuss enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Mar 17, 2021

flash1293 mentioned this issue Mar 17, 2021

[Meta][Lens] Data Modelling #57708

Closed

flash1293 mentioned this issue Oct 28, 2021

[Lens] Formula: allow "count of hits.total.value", the count of all the documents in which a field value appears, including any (*) field values #115770

Closed

ghudgins closed this as completed Aug 30, 2022

ghudgins mentioned this issue Aug 30, 2022

Pareto charts #96488

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lens] Formula: Define level of metric #94789

[Lens] Formula: Define level of metric #94789

flash1293 commented Mar 17, 2021

elasticmachine commented Mar 17, 2021

wylieconlon commented Mar 22, 2021

flash1293 commented Mar 23, 2021

flash1293 commented Oct 28, 2021

ghudgins commented Nov 1, 2021 •

edited

Loading

flash1293 commented Nov 2, 2021

ghudgins commented Aug 30, 2022

[Lens] Formula: Define level of metric #94789

[Lens] Formula: Define level of metric #94789

Comments

flash1293 commented Mar 17, 2021

Implementation

Use case

Concerns

elasticmachine commented Mar 17, 2021

wylieconlon commented Mar 22, 2021

flash1293 commented Mar 23, 2021

flash1293 commented Oct 28, 2021

ghudgins commented Nov 1, 2021 • edited Loading

flash1293 commented Nov 2, 2021

ghudgins commented Aug 30, 2022

ghudgins commented Nov 1, 2021 •

edited

Loading