[Lens] Add derivative function #61775

timroes · 2020-03-30T10:56:14Z

Add a derivative pipeline aggregation to Lens. See #56696 for more discussion.
Tasks:

Add expression function (Add derivative function #81178)
Implement operation in Lens (blocked by [Lens] Change the internal API of operations to support calculations #76828)

elasticmachine · 2020-03-30T10:56:15Z

Pinging @elastic/kibana-app (Team:KibanaApp)

wylieconlon · 2020-09-04T18:04:07Z

The definition of derivative for our purposes is a function which subtracts sequential values in a date histogram to calculate the instant diff between sequential values. Derivatives in this context are discrete, as in non-continuous, and may have gaps.

Because the date histogram has a duration in time, the derivative function supports scaling values to a specific time interval, such as "derivative per second". Derivative values can be positive or negative.

User inputs

The derivative function requires a group by columns parameter, but this can be automatically set by Lens. This makes the derivative function only have optional inputs. Optional inputs are:

Scaled time unit (per second, etc)
Policy for handling gaps: Skip or replace with zeros (this is separate from the fitting functions)

This leads to a function signature like:

interface DerivativeArgs {
  // Table is used to determine the time units
  table: KibanaDatatable;
  // Required list of group by columns. Don't group the time field
  groupByIds: string[];
  scaleTo?: 'ms' | 's' | 'h' | 'd' | 'M';
  gapPolicy?: 'skip' | 'insert_zeros';
}

type DerivativeFunction = (input: DerivativeArgs) => KibanaDatatable;

Form design

This form is missing a way to set a "gap policy", but is otherwise close:

Table example with gap skipping (default)

timestamp per 3 hours	Count	Derivative	Derivative per hour
2020-08-21 15:00	10	-	-
2020-08-21 21:00	-	-	-
2020-08-21 18:00	19	-	-
2020-08-22 00:00	22	3	1
2020-08-22 03:00	13	-9	-3

Table example with zeroes

timestamp per 3 hours	Count	Derivative	Derivative per hour
2020-08-21 15:00	10	-	-
2020-08-21 21:00	-	-10	-3.33
2020-08-21 18:00	19	19	6.33
2020-08-22 00:00	22	3	1
2020-08-22 03:00	13	-9	-3

As you can see in this example, the value goes negative if there is missing data. I find this behavior a little annoying, so I consider it up for debate whether it should go negative or return to 0 for missing data.

Example visualizations

Derivatives can only be used in XY charts and data tables. They can't be rendered in pie charts because the values can go negative.

The simplest way to render a derivative is as a line chart. Derivatives can be calculated on a single line or as many lines indicating categories:

But because we are trying to not do the bare minimum in Lens, we should also consider the most frequent requests that users have. For example, a common request is to have "red and green" colors to indicate derivatives, with a black color to indicate the underlying values. Here's an example I did in TSVB which required a lot of manual setup. Can Lens make this easy?

Going even beyond this, @monfera has worked on examples of derivatives where the derivative is shown as a cumulative derivative, also known as a waterfall chart. This chart type also uses red and green coloring, and shows negative values in the context of the overall trendline. Another feature of waterfall charts is that we can apply them as annotations on top of bar charts.

Implementation notes

The derivative function should be implemented as part of the standard library of expression functions, instead of using the aggregation features of Elasticsearch. This gives us the ability to compose more functions on top of the derivative. For example, the "time scaling" feature might actually be implemented as a separate expression function, making derivative a combination of two expression functions.

I don't consider the red/green styling or waterfall charts to be requirements for shipping a derivative feature. When we choose to implement this feature, it should be done as a chart styling option that might be applied automatically for derivatives, but that can also be applied to any data that goes positive and negative.

Steps to implement:

Not blocked by ongoing discussion about how tables will provide the time interval, because we can make forward progress by using one of these PRs as a hack
Start implementing the underlying Lens dependencies:
Formatter to append "per second": [Lens] Support a formatter or format option which append "per second" "per minute" "per hour" #76714
Make changes to the way that operations are defined in the Lens datasource, so that they can output more than esaggs
Write the expression function to do the table manipulation

Stretch goals are:

Implement red/green styling
Build a waterfall chart option

wylieconlon · 2020-09-04T19:26:59Z

@AlonaNadler Does the "gap policy" phrasing make sense as shown here? Do you have a better way to describe the use case for zeroing out the charts? We definitely need to implement this in code, but I noticed that we don't support this in TSVB or Visualize.

Also, I've listed some stretch goals of supporting red/green styling and waterfall charts, as shown above. Do you agree, or do you think these are required for Lens by default?

AlonaNadler · 2020-09-16T22:29:37Z

Can we address the gap policy as part of the fitting function?
Users then can decide if the missing values are zero
without stating anything in the fitting function behavior should be similar to TSVB default

solving the gap policy is not a high priority in my opinion
stretch goals seem great especially the red green, they are not mandatory for Lens default

wylieconlon · 2020-09-16T23:35:51Z

@AlonaNadler I agree that gap policy is not a high priority. If we decide to do it, it will be completely separate from the fitting function for technical reasons.

wylieconlon · 2020-09-23T23:13:15Z

The function signature I proposed earlier not complete for several reasons, and this is an attempt to update the proposed signature.

There were no parameters to indicate which number to derive from
There were no parameters to generate a new column
The time scaling parameters are no longer needed because we will implement a separate time scaling function

Generating a new column requires us to have a new column ID, human-readable name, and formatHint.

In total, I think this is new new interface for the derivative expression function:

interface DerivativeArgs {
  groupBy: string[];
  inputColumn: string;
  outputColumnId: string;
  outputColumnName: string;
  outputColumnSerializedFormat: string;
  gapPolicy?: 'skip' | 'insert_zeroes';
}

My confidence level in this signature is higher than before because I wrote an actual expression function with these arguments, but it could change again if we run into consistency issues with the other time series functions.

flash1293 · 2020-09-24T07:16:38Z

This looks almost good to me, thanks for taking these things into account. While thinking a little about it I came up with some light additional touches (but I suspect we will continue iterating on this while actually implementing):

interface DerivativeArgs {
  groupBy?: string[];
  inputColumn: string;
  outputColumnId?: string;
  outputColumnName?: string;
  outputColumnSerializedFormat?: string;
  gapPolicy?: 'skip' | 'insert_zeroes';
}

(basically making output column and groupBy configuration optional)

The behavior would be as follows:

outputColumnId defaults to inputColumn
outputColumnName defaults to the name of inputColumn
outputColumnSerializedFormat defaults to the format of inputColumn
groupBy defaults to an empty array for cases where you don't have grouping columns. This is necessary because you can't set an expression argument to an empty array (unless I'm missing something)

monfera · 2020-09-30T14:14:14Z

A naming suggestion as they are important for UX and even DX. Would it be possible to change the working term "derivative" to "differences" in the UI? I may overlook a good reason for calling it derivative. We don't have continuous functions, our binned time series arenn't differentiable, and even if we disregard the lack of continuity it's not some kind of tangent at the point, and not even a ratio of dx and dy (ie. not angle related), it's just the dy and is a backward looking measure, not centered or infinitesimally small.

"Derivative" has some looser meaning too (=stuff you compute from other things, derived information) but it's not an ideal fit either, eg. it's too specific for that.

flash1293 · 2020-09-30T14:29:30Z

I guess we inherited this from Elasticsearch (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-derivative-aggregation.html), so one point in favor of "derivative" is familiarity for users used to the stack.

But we are not following Elasticsearch terminology in a lot of places so it's totally valid to rethink this.

cc @gchaps maybe you have another idea.

Naming is something we can iterate on separately from the functionality (as long as it's happening between releases of course)

monfera · 2020-09-30T14:29:46Z

Calculation of differences: Subtracting the value of the previous bin from the value of the current bin leads to accumulation of minuscule errors, which may or may not matter(*), could be decided upfront, though the implementation and runtime cost is the same.

If it matters, a robust way for computing deltas is to run through the series bin by bin, and compute the difference between (A) the cumulative sum of the differences computed already, and (B) the current value. The new difference will be the subtraction of the cumulated sum (sum bins[0].delta...deltabins[N-1].delta) from the current bin value bins[N].value. This way, numerical errors do not accumulate, ie. there's a known very small upper bound on the sum of those, given arbitrary intervals.

(*) It might matter when differences are eg. reintegrated downstream over an interval; with the robust method you know what epsilon to use when judging if the resulting number is (likely) a zero, or some very small positive or negative number.

It may also matter when eg. an ES payload is sent, for compression, in a delta-encoded way; eg. hourly temperature values won't wildly differ from each other, so it's more efficient to send serial deltas down the network for somewhat continuous phenomena, or when there are long stretches of unchanged values (delta=0, compresses well with RLE)

Btw. an alternative to the name "differences" would be "deltas" (or singular forms, or variation eg. series delta)

monfera · 2020-09-30T14:35:09Z

Thanks @flash1293 - some earlier discussions eg. with Raya and Vijay touched on the tradeoffs of using the industry standard terminology, or using the term as used by Elasticsearch, if they differ. Not sure if the product design principles for Lens design made it fall one way or another, or somewhat accidental. For example, there may be some decision document that voted in favor of "bucket" instead of the more standard term "bin". Again, who knows, there may be a good reason for calling it differentiation, besides momentum or accident

gchaps · 2020-09-30T22:14:45Z

I lean toward using "difference" or "delta" because they are easier to understand at a glance.

monfera · 2020-10-01T13:42:42Z

As we use the quite precise "cumulative sum" and not "integration" elsewhere, consistency is another support for using differences, deltas or running deltas or some such here

flash1293 · 2021-01-04T15:54:58Z

Closed by #84384

timroes added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Mar 30, 2020

This was referenced Mar 30, 2020

[Meta] [Lens] Making Lens GA #61453

Closed

[Meta][Lens] Data Modelling #57708

Closed

timroes mentioned this issue Aug 6, 2020

[Meta][Lens] Calculations and advanced queries #57713

Closed

13 tasks

timductive mentioned this issue Aug 10, 2020

[Meta][Lens] Lens by Default #74685

Closed

26 tasks

This was referenced Aug 11, 2020

[Lens] Support for all time series functions #74813

Closed

[Lens] Design for reorganized and renamed functions for Lens by default #74908

Closed

wylieconlon changed the title ~~[Lens] Add derivative pipeline aggregation~~ [Lens] Add derivative function Sep 4, 2020

wylieconlon mentioned this issue Sep 4, 2020

[expressions] Standard library of table manipulation functions #68930

Closed

stacey-gammon added the Project:LensDefault label Sep 16, 2020

wylieconlon mentioned this issue Sep 16, 2020

[Lens] Add moving average aggregation #61777

Closed

flash1293 added the loe:needs-research This issue requires some research before it can be worked on or estimated label Oct 2, 2020

flash1293 removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Oct 12, 2020

wylieconlon mentioned this issue Oct 12, 2020

Add cumulative sum expression function #80129

Merged

2 tasks

flash1293 self-assigned this Oct 20, 2020

flash1293 mentioned this issue Oct 20, 2020

Add derivative function #81178

Merged

flash1293 removed their assignment Nov 16, 2020

flash1293 closed this as completed Jan 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lens] Add derivative function #61775

[Lens] Add derivative function #61775

timroes commented Mar 30, 2020 •

edited by timductive

Loading

elasticmachine commented Mar 30, 2020

wylieconlon commented Sep 4, 2020 •

edited

Loading

wylieconlon commented Sep 4, 2020

AlonaNadler commented Sep 16, 2020

wylieconlon commented Sep 16, 2020

wylieconlon commented Sep 23, 2020

flash1293 commented Sep 24, 2020

monfera commented Sep 30, 2020

flash1293 commented Sep 30, 2020

monfera commented Sep 30, 2020

monfera commented Sep 30, 2020

gchaps commented Sep 30, 2020

monfera commented Oct 1, 2020

flash1293 commented Jan 4, 2021

[Lens] Add derivative function #61775

[Lens] Add derivative function #61775

Comments

timroes commented Mar 30, 2020 • edited by timductive Loading

elasticmachine commented Mar 30, 2020

wylieconlon commented Sep 4, 2020 • edited Loading

User inputs

Form design

Table example with gap skipping (default)

Table example with zeroes

Example visualizations

Implementation notes

wylieconlon commented Sep 4, 2020

AlonaNadler commented Sep 16, 2020

wylieconlon commented Sep 16, 2020

wylieconlon commented Sep 23, 2020

flash1293 commented Sep 24, 2020

monfera commented Sep 30, 2020

flash1293 commented Sep 30, 2020

monfera commented Sep 30, 2020

monfera commented Sep 30, 2020

gchaps commented Sep 30, 2020

monfera commented Oct 1, 2020

flash1293 commented Jan 4, 2021

timroes commented Mar 30, 2020 •

edited by timductive

Loading

wylieconlon commented Sep 4, 2020 •

edited

Loading