Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support filtering on _time with column scan #2673

Closed
wants to merge 1 commit into from

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Mar 16, 2016

This PR allows filters to be applied to the special __time column which does not have a dictionary/bitmap index.

  • Updates makeValueMatcher functions in IncrementalIndexStorageAdapter to support __time column
  • Adds a function for performing a full scan on the __time column to ColumnSelectorBitmapIndex.

A benchmark has been included for comparing the performance of a full scan on __time vs. bitmap index-based filtering on a String column that contains ISO timestamps.

Example results for various cardinalities, loadMainTime reads from the __time column, loadSecondTime reads from the String time column.

250,000 rows

cardinality=1
ColumnScanBenchmark.loadMainTime    avgt   10  11335.082 ± 466.723  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10     51.619 ±   5.596  us/op

cardinality=100
Benchmark                           Mode  Cnt      Score     Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10  11974.823 ± 462.547  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10   1003.198 ±  48.338  us/op

cardinality=2500
Benchmark                           Mode  Cnt      Score      Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10  13735.795 ± 1074.677  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  10604.697 ±  384.406  us/op

cardinality=25000
Benchmark                           Mode  Cnt      Score      Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10  16357.802 ±  329.982  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  92038.803 ± 3612.285  us/op

cardinality=50000
Benchmark                           Mode  Cnt       Score      Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10   18931.669 ± 1024.174  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  166773.346 ± 4018.626  us/op

cardinality=125000
Benchmark                           Mode  Cnt       Score       Error  Units
ColumnScanBenchmark.loadMainTime    avgt   10   29742.121 ±   872.175  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  500961.399 ± 13750.984  us/op

cardinality=250000
ColumnScanBenchmark.loadMainTime    avgt   10   46596.902 ±  2686.601  us/op
ColumnScanBenchmark.loadSecondTime  avgt   10  998234.222 ± 38658.422  us/op

Fixes #2652

@@ -0,0 +1,292 @@
package io.druid.benchmark;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder fi we should document the list of benchmarks somewhere

will make it easy for developers to test their work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjy Added header

I can add an initial benchmarks doc to content/development, that sounds useful

@jon-wei
Copy link
Contributor Author

jon-wei commented Mar 28, 2016

Closing this pending implementation of #2742

@jon-wei jon-wei closed this Mar 28, 2016
@jon-wei jon-wei deleted the time_filter branch October 6, 2017 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants