-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up interval rounding #63245
Speed up interval rounding #63245
Conversation
This speeds up date_histogram by precomputing the rounding points for date intervals like `10d`. The speedup for the rounding itself is between 18% (UTC many buckets) and 65% (US Eastern Time few buckets). 43% seems like it'd be pretty common: ``` Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units before 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 130822390.700 ± 177466.657 ns/op before 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 189236837.930 ± 7958933.566 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 66413746.325 ± 1578834.032 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 65656941.375 ± 291608.870 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 130854975.013 ± 369133.702 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 165831615.257 ± 139074.982 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 107832636.671 ± 3502704.198 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 107608802.940 ± 979286.160 ns/op ``` Speedup for the date_histogram is likely to vary based on how much IO dominates the collection.
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
I'm running performance tests locally to see what I see on some real data. |
The data that I have to run performance tests for this actually doesn't work well because it contains outliers which disable this optimization - there are some docs with dates in 1970 and some on 2050 but the bulk of them are around 2015 or something. That kind of thing defeats this optimization. So we're going to have to rely on the lower level performance test here. I'll have a think about outliers and how we can stop them from defeating the optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @not-napoleon ! |
This speeds up date_histogram by precomputing the rounding points for date intervals like `10d`. The speedup for the rounding itself is between 18% (UTC many buckets) and 65% (US Eastern Time few buckets). 43% seems like it'd be pretty common: ``` Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units before 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 130822390.700 ± 177466.657 ns/op before 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 189236837.930 ± 7958933.566 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 66413746.325 ± 1578834.032 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 65656941.375 ± 291608.870 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 130854975.013 ± 369133.702 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 165831615.257 ± 139074.982 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 107832636.671 ± 3502704.198 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 107608802.940 ± 979286.160 ns/op ``` Speedup for the date_histogram is likely to vary based on how much IO dominates the collection.
This speeds up date_histogram by precomputing the rounding points for date intervals like `10d`. The speedup for the rounding itself is between 18% (UTC many buckets) and 65% (US Eastern Time few buckets). 43% seems like it'd be pretty common: ``` Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units before 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 130822390.700 ± 177466.657 ns/op before 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 189236837.930 ± 7958933.566 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 UTC avgt 10 66413746.325 ± 1578834.032 ns/op after 10000000 10d 2000-10-28 to 2000-10-31 America/New_York avgt 10 65656941.375 ± 291608.870 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 130854975.013 ± 369133.702 ns/op before 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 165831615.257 ± 139074.982 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 UTC avgt 10 107832636.671 ± 3502704.198 ns/op after 10000000 2h 2000-10-28 to 2000-10-31 America/New_York avgt 10 107608802.940 ± 979286.160 ns/op ``` Speedup for the date_histogram is likely to vary based on how much IO dominates the collection.
The backport of this didn't make the branch cut for 7.10 so it'll release with 7.11. |
This speeds up date_histogram by precomputing the rounding points for
date intervals like
10d
. The speedup for the rounding itself isbetween 18% (UTC many buckets) and 65% (US Eastern Time few buckets).
43% seems like it'd be pretty common:
Speedup for the date_histogram is likely to vary based on how much IO
dominates the collection.