Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sqlReverseLookupThreshold for ReverseLookupRule. #15832

Merged

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Feb 5, 2024

If lots of keys map to the same value, reversing a LOOKUP call can slow things down unacceptably. To protect against this, this patch introduces a parameter sqlReverseLookupThreshold representing the maximum size of an IN filter that will be created as part of lookup reversal.

If inSubQueryThreshold is set to a smaller value than sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead. This allows users to use that single parameter to control IN sizes if they wish.

Benchmarks follow. I chose 10000 as the default for sqlReverseLookupThreshold since it keeps planning time under 1 second. Future work to speed up IN filters could allow us to raise the default threshold.

Benchmark                                (keysPerValue)  (lookupType)  (numKeys)  Mode  Cnt     Score     Error  Units
SqlReverseLookupBenchmark.planEquals               1000       hashmap    5000000  avgt    5   163.002 ±   4.228  ms/op
SqlReverseLookupBenchmark.planEquals               1000     immutable    5000000  avgt    5    43.095 ±   2.864  ms/op
SqlReverseLookupBenchmark.planEquals              10000       hashmap    5000000  avgt    5   734.592 ±  34.374  ms/op
SqlReverseLookupBenchmark.planEquals              10000     immutable    5000000  avgt    5   555.980 ±  49.903  ms/op
SqlReverseLookupBenchmark.planEquals             100000       hashmap    5000000  avgt    5  8545.459 ± 108.931  ms/op
SqlReverseLookupBenchmark.planEquals             100000     immutable    5000000  avgt    5  8415.105 ± 116.926  ms/op
SqlReverseLookupBenchmark.planNotEquals            1000       hashmap    5000000  avgt    5   257.995 ±   5.576  ms/op
SqlReverseLookupBenchmark.planNotEquals            1000     immutable    5000000  avgt    5    41.088 ±   1.582  ms/op
SqlReverseLookupBenchmark.planNotEquals           10000       hashmap    5000000  avgt    5   776.826 ±   8.265  ms/op
SqlReverseLookupBenchmark.planNotEquals           10000     immutable    5000000  avgt    5   583.022 ±  19.766  ms/op
SqlReverseLookupBenchmark.planNotEquals          100000       hashmap    5000000  avgt    5  9019.350 ± 144.835  ms/op
SqlReverseLookupBenchmark.planNotEquals          100000     immutable    5000000  avgt    5  8754.859 ± 429.341  ms/op

If lots of keys map to the same value, reversing a LOOKUP call can slow
things down unacceptably. To protect against this, this patch introduces
a parameter sqlReverseLookupThreshold representing the maximum size of an
IN filter that will be created as part of lookup reversal.

If inSubQueryThreshold is set to a smaller value than
sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead.
This allows users to use that single parameter to control IN sizes if they
wish.
@@ -77,7 +77,7 @@ public LookupExtractor build(Iterable<Pair<String, String>> keyValuePairs)
return new MapLookupExtractor(map, false);
}
},
REVERSIBLE {
IMMUTABLE {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called immutable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the immutable lookup map added in #15675

@abhishekagarwal87 abhishekagarwal87 merged commit 54b3064 into apache:master Feb 6, 2024
82 of 83 checks passed
@abhishekagarwal87 abhishekagarwal87 added this to the 29.0.0 milestone Feb 6, 2024
@gianm gianm deleted the sql-reverse-lookup-threshold branch February 6, 2024 14:03
LakshSingla pushed a commit to LakshSingla/druid that referenced this pull request Feb 7, 2024
If lots of keys map to the same value, reversing a LOOKUP call can slow
things down unacceptably. To protect against this, this patch introduces
a parameter sqlReverseLookupThreshold representing the maximum size of an
IN filter that will be created as part of lookup reversal.

If inSubQueryThreshold is set to a smaller value than
sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead.
This allows users to use that single parameter to control IN sizes if they
wish.
cryptoe pushed a commit that referenced this pull request Feb 7, 2024
If lots of keys map to the same value, reversing a LOOKUP call can slow
things down unacceptably. To protect against this, this patch introduces
a parameter sqlReverseLookupThreshold representing the maximum size of an
IN filter that will be created as part of lookup reversal.

If inSubQueryThreshold is set to a smaller value than
sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead.
This allows users to use that single parameter to control IN sizes if they
wish.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants