Speed up interval rounding #63245

nik9000 · 2020-10-05T14:34:36Z

This speeds up date_histogram by precomputing the rounding points for
date intervals like 10d. The speedup for the rounding itself is
between 18% (UTC many buckets) and 65% (US Eastern Time few buckets).
43% seems like it'd be pretty common:

Benchmark   (count)  (interval)                   (range)           (zone)  Mode  Cnt          Score         Error  Units
before     10000000         10d  2000-10-28 to 2000-10-31              UTC  avgt   10  130822390.700 ±  177466.657  ns/op
before     10000000         10d  2000-10-28 to 2000-10-31 America/New_York  avgt   10  189236837.930 ± 7958933.566  ns/op
after      10000000         10d  2000-10-28 to 2000-10-31              UTC  avgt   10   66413746.325 ± 1578834.032  ns/op
after      10000000         10d  2000-10-28 to 2000-10-31 America/New_York  avgt   10   65656941.375 ±  291608.870  ns/op

before     10000000          2h  2000-10-28 to 2000-10-31              UTC  avgt   10  130854975.013 ±  369133.702  ns/op
before     10000000          2h  2000-10-28 to 2000-10-31 America/New_York  avgt   10  165831615.257 ±  139074.982  ns/op
after      10000000          2h  2000-10-28 to 2000-10-31              UTC  avgt   10  107832636.671 ± 3502704.198  ns/op
after      10000000          2h  2000-10-28 to 2000-10-31 America/New_York  avgt   10  107608802.940 ±  979286.160  ns/op

Speedup for the date_histogram is likely to vary based on how much IO
dominates the collection.

This speeds up date_histogram by precomputing the rounding points for date intervals like `10d`. The speedup for the rounding itself is between 18% (UTC many buckets) and 65% (US Eastern Time few buckets). 43% seems like it'd be pretty common: ``` Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units before 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 130822390.700 ± 177466.657 ns/op before 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 189236837.930 ± 7958933.566 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 66413746.325 ± 1578834.032 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 65656941.375 ± 291608.870 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 130854975.013 ± 369133.702 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 165831615.257 ± 139074.982 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 107832636.671 ± 3502704.198 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 107608802.940 ± 979286.160 ns/op ``` Speedup for the date_histogram is likely to vary based on how much IO dominates the collection.

elasticmachine · 2020-10-05T14:34:38Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

nik9000 · 2020-10-05T14:34:55Z

I'm running performance tests locally to see what I see on some real data.

nik9000 · 2020-10-05T18:10:23Z

The data that I have to run performance tests for this actually doesn't work well because it contains outliers which disable this optimization - there are some docs with dates in 1970 and some on 2050 but the bulk of them are around 2015 or something. That kind of thing defeats this optimization. So we're going to have to rely on the lower level performance test here. I'll have a think about outliers and how we can stop them from defeating the optimization.

not-napoleon

LGTM

nik9000 · 2020-10-06T20:48:55Z

Thanks @not-napoleon !

This speeds up date_histogram by precomputing the rounding points for date intervals like `10d`. The speedup for the rounding itself is between 18% (UTC many buckets) and 65% (US Eastern Time few buckets). 43% seems like it'd be pretty common: ``` Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units before 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 130822390.700 ± 177466.657 ns/op before 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 189236837.930 ± 7958933.566 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 66413746.325 ± 1578834.032 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 65656941.375 ± 291608.870 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 130854975.013 ± 369133.702 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 165831615.257 ± 139074.982 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 107832636.671 ± 3502704.198 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 107608802.940 ± 979286.160 ns/op ``` Speedup for the date_histogram is likely to vary based on how much IO dominates the collection.

nik9000 · 2020-10-07T13:25:23Z

The backport of this didn't make the branch cut for 7.10 so it'll release with 7.11.

nik9000 added >enhancement :Analytics/Aggregations Aggregations v8.0.0 v7.10.0 labels Oct 5, 2020

nik9000 requested a review from not-napoleon October 5, 2020 14:34

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 5, 2020

not-napoleon approved these changes Oct 6, 2020

View reviewed changes

nik9000 merged commit 62a74d0 into elastic:master Oct 6, 2020

nik9000 added the backport pending label Oct 6, 2020

nik9000 added v7.11.0 and removed v7.10.0 labels Oct 7, 2020

nik9000 mentioned this pull request Oct 7, 2020

Consider query when optimizing date rounding #63403

Merged

nik9000 removed the backport pending label Jul 8, 2021

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up interval rounding #63245

Speed up interval rounding #63245

nik9000 commented Oct 5, 2020

elasticmachine commented Oct 5, 2020

nik9000 commented Oct 5, 2020

nik9000 commented Oct 5, 2020

not-napoleon left a comment

nik9000 commented Oct 6, 2020

nik9000 commented Oct 7, 2020

Speed up interval rounding #63245

Speed up interval rounding #63245

Conversation

nik9000 commented Oct 5, 2020

elasticmachine commented Oct 5, 2020

nik9000 commented Oct 5, 2020

nik9000 commented Oct 5, 2020

not-napoleon left a comment

Choose a reason for hiding this comment

nik9000 commented Oct 6, 2020

nik9000 commented Oct 7, 2020