Support filtering on _time with column scan #2673

jon-wei · 2016-03-16T21:22:56Z

This PR allows filters to be applied to the special __time column which does not have a dictionary/bitmap index.

Updates makeValueMatcher functions in IncrementalIndexStorageAdapter to support __time column
Adds a function for performing a full scan on the __time column to ColumnSelectorBitmapIndex.

A benchmark has been included for comparing the performance of a full scan on __time vs. bitmap index-based filtering on a String column that contains ISO timestamps.

Example results for various cardinalities, loadMainTime reads from the __time column, loadSecondTime reads from the String time column.

250,000 rows

cardinality=1
ColumnScanBenchmark.loadMainTime    avgt   10  11335.082 ± 466.723  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10     51.619 ±   5.596  us/op

cardinality=100
Benchmark                           Mode  Cnt      Score     Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10  11974.823 ± 462.547  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10   1003.198 ±  48.338  us/op

cardinality=2500
Benchmark                           Mode  Cnt      Score      Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10  13735.795 ± 1074.677  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  10604.697 ±  384.406  us/op

cardinality=25000
Benchmark                           Mode  Cnt      Score      Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10  16357.802 ±  329.982  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  92038.803 ± 3612.285  us/op

cardinality=50000
Benchmark                           Mode  Cnt       Score      Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10   18931.669 ± 1024.174  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  166773.346 ± 4018.626  us/op

cardinality=125000
Benchmark                           Mode  Cnt       Score       Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10   29742.121 ±   872.175  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  500961.399 ± 13750.984  us/op

cardinality=250000
ColumnScanBenchmark.loadMainTime    avgt   10   46596.902 ±  2686.601  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  998234.222 ± 38658.422  us/op

Fixes #2652

fjy · 2016-03-17T00:51:08Z

benchmarks/src/main/java/io/druid/benchmark/ColumnScanBenchmark.java

@@ -0,0 +1,292 @@
+package io.druid.benchmark;


i wonder fi we should document the list of benchmarks somewhere

will make it easy for developers to test their work

@fjy Added header

I can add an initial benchmarks doc to content/development, that sounds useful

jon-wei · 2016-03-28T23:42:11Z

Closing this pending implementation of #2742

jon-wei mentioned this pull request Mar 16, 2016

Extraction filter with time format function not working (0.8.2) #2652

Closed

fjy added this to the 0.9.1 milestone Mar 17, 2016

fjy added the Bug label Mar 17, 2016

fjy reviewed Mar 17, 2016
View reviewed changes

Support filtering on __time column with column scan

cf0e3ec

jon-wei force-pushed the time_filter branch from 4acdf9f to cf0e3ec Compare March 17, 2016 01:36

jon-wei closed this Mar 17, 2016

jon-wei reopened this Mar 17, 2016

jon-wei mentioned this pull request Mar 17, 2016

[WIP] Add initial documentation for benchmarks in io.druid.benchmark #2679

Closed

jon-wei mentioned this pull request Mar 26, 2016

[Proposal] Two-stage Filtering #2742

Closed

jon-wei closed this Mar 28, 2016

jon-wei deleted the time_filter branch October 6, 2017 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support filtering on _time with column scan #2673

Support filtering on _time with column scan #2673

jon-wei commented Mar 16, 2016

fjy Mar 17, 2016

fjy Mar 17, 2016

jon-wei Mar 17, 2016

jon-wei commented Mar 28, 2016

Support filtering on _time with column scan #2673

Support filtering on _time with column scan #2673

Conversation

jon-wei commented Mar 16, 2016

fjy Mar 17, 2016

Choose a reason for hiding this comment

fjy Mar 17, 2016

Choose a reason for hiding this comment

jon-wei Mar 17, 2016

Choose a reason for hiding this comment

jon-wei commented Mar 28, 2016