Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Formula: Define level of metric #94789

Closed
flash1293 opened this issue Mar 17, 2021 · 7 comments
Closed

[Lens] Formula: Define level of metric #94789

flash1293 opened this issue Mar 17, 2021 · 7 comments
Labels
discuss enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@flash1293
Copy link
Contributor

Right now all metrics specified in a formula are nested in all defined bucket dimensions of the current chart. In some cases however it's useful to work with a metric from a higher level of the aggregation tree. Overall metrics (#94597) can behave similar in some cases, but there are a few differences.

Implementation

For each metric, there could be a parameter of which buckets dimensions to apply (defaults to all of them): median(bytes, overallFor=reference("Top values geo.src")) - in a chart over time with a "break down by" dimension of top values of geo.src, this would give the median for each geo.src without applying the date histogram dimension.

On the implementation side this would require us to do multiple esaggs calls (for each combination of skipped/unskipped bucket aggs), then merge together the resulting table, joining in the higher level metrics.

This API would be more flexible than what Elasticsearch offers right now (you can define higher level metrics, but only in the order of the tree structure - this means you can't skip the root bucket agg, but keep the nested one).

Use case

I'm not sure whether we should offer this, because the differences to overall metrics are easy to confuse. For example, avg(bytes, overallFor=reference("Top values geo.src")) can also be written as

overall_sum(sum(bytes), group_by=reference("Top values geo.src")) / 
overall_sum(count(filter=bytes: *), group_by=reference("Top values geo.src"))

as long as "Top values geo.src" actually fetches all data ("Other" bucket included or size parameter high enough). However, it won't work for all of them (e.g. median)

Concerns

The main concern is stated above - would people be able to understand the difference between overall metrics calculated client side and metrics on different levels calculated on Elasticsearch side?

Also, if the API is implemented like defined here, it would be possible to get other "terms" buckets (because the top 5 terms change if some buckets within the hierarchy are skipped) - this would be a little confusing because for some buckets the overall metric would be missing (there's a similar issue with time offset)

@flash1293 flash1293 added discuss enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Mar 17, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@wylieconlon
Copy link
Contributor

I agree that this is a hard concept to explain to users, and I think you may have missed an important case: top level unbucketed metrics. This is similar to the dynamic thresholds discussion we've been having, where I could build a formula to hide any values that are below median. Something like:

if(last_value(bytes) < percentile(bytes, percentile=90, top_level=true), last_value(bytes), null): Only show the top 90+ percentile bytes

Here are some rough ideas for how we can clarify this concept for users. Each idea is separate:

  • To change the ES metric level, you have to use a named parameter. To calculate a column-oriented "overall" metric, you have to use a separate function. This makes the API look & feel different.
  • Instead of allowing overallFor, we could only allow 2 levels. "Outer" and "inner" for example, as in unbucketed vs inner metric
  • We could restrict the level-changing logic to a new type of UI that we haven't built yet. For example, what if every dimension is shown in a single "Formula view" that shows the hierarchy?

@flash1293
Copy link
Contributor Author

Agreed, top level metrics are important as well.

To change the ES metric level, you have to use a named parameter. To calculate a column-oriented "overall" metric, you have to use a separate function. This makes the API look & feel different.

That definitely makes sense and would be consistent with the rest of how formula works, I was just wondering whether it's enough. Maybe I'm overthinking

Instead of allowing overallFor, we could only allow 2 levels. "Outer" and "inner" for example, as in unbucketed vs inner metric

It's reducing the complexity while also reducing the expressiveness of formula, but maybe that's OK as a first step. Once people start using it we can see in which direction to evolve.

We could restrict the level-changing logic to a new type of UI that we haven't built yet. For example, what if every dimension is shown in a single "Formula view" that shows the hierarchy?

This seems like the most expensive option, but maybe it's the next logical step. I still hope we can avoid doing a mixed UI like this (and implement expression/sql datasources to cover these use cases instead)

@flash1293
Copy link
Contributor Author

Another important use case for this is array values - in some cases working with them summing up the rows using overall_sum is not the same as executing the metric “outside” of the current aggregation tree because of overlaps: #115770

@ghudgins
Copy link

ghudgins commented Nov 1, 2021

I think this applies - https://discuss.elastic.co/t/how-to-get-the-total-of-memory-and-cpu-usage-of-my-cluster/288170

something like the below could solve what they're looking for (they want the overall sum but don't want to display the field from the group by to get it)

overall_sum(
    average(kubernetes.node.memory.capacity.bytes, groupby='kubernetes.node.name')
    )

@flash1293
Copy link
Contributor Author

It sounds more like #94619 which is closely related - the difference is that in "define level of metric" a single metric is not broken down, but the breakdown is still shown in the chart while in "Collapse bucket column" nothing is broken down (and not shown in the chart), except for the single metric which is summed up for display.

@ghudgins
Copy link

we plan on accounting for this requirement in our query system (and its ability to query and transform data) - #126095

Closing this issue as formula is not the intended solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants