-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generic dimension handling interface #2760
Conversation
rebasing, there are some conflicts with #2753 |
61d8c00
to
ef3f162
Compare
rebased @gianm Updated StringDimensionHandler with your recent IndexMerger changes; it had most of the null handling code already since StringDimensionMerger was based off IndexMergerV9, just had to add this part to closeWritersV8(): |
ef3f162
to
2b735b8
Compare
61cf9ef
to
b7f212f
Compare
ac80a52
to
ff7641c
Compare
if (o2 == null) { | ||
return 1; | ||
} | ||
return ObjectUtils.compare(o1, o2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ObjectUtils.compare(o1,o2) handles null as well so the null checks before are redundant or replace this with just
o1.compareTo(o2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@himanshug changed this and ENCODED_COMPARATOR to use o1.compareTo(o2)
ff7641c
to
3af8dc6
Compare
Added some benchmark numbers comparing performance between this patch and master, as of 4/25/2016 |
@jon-wei based on your numbers it looks like most of the queries are unaffected, however I do see a 20% increase in It would be good to investigate those in more detail and see if we can reduce the impact on those queries. 10% is a pretty significant drop in performance. Everything else is either faster, or within +/- 5%, which is with error bands. You might want to run your benchmark on EC2 instead of a laptop to get those errors bands narrower (see https://metamarkets.com/2015/effect-of-frequency-governor-on-java-benchmarking/) |
@xvrl I've been digging into the performance slowdown in GroupBy and Search. I've made some improvements to GroupByEngine (haven't committed changes to this PR yet). The increase shown in SearchBenchmark was found to ultimately come from an increase in the time it takes to read rows from an IncrementalIndex. I've narrowed the cause there to the change from int[][] to Comparable[][] in various places, but still looking into how to improve the performance there. |
* @param adapters List of adapters to be merged. | ||
* @throws IOException | ||
*/ | ||
public void mergeAcrossSegments(List<IndexableAdapter> adapters) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is called, would addColumnValue
be called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addColumnValue wouldn't be called within mergeAcrossSegments, it is called later when iterating through the merged Rowboat iterable
"mergeSegmentIndices" might be a better name than "mergeAcrossSegments"...
5fcb521
to
99dd062
Compare
Updated this to address PR comments, will post benchmark results when I have them available. |
1474059
to
7adf0c3
Compare
Updated description with new benchmark results as of May 12. The description also mentions a change in the locking behavior of the dimension value dictionary, with related benchmark results. |
7adf0c3
to
171c870
Compare
171c870
to
8f9478d
Compare
Going to break this PR up even further, closing this for now |
This PR is a trimmed down version of #2621 that introduces a generic dimension handling infrastructure used for Long/Float dimension handling. However, only String dimensions are currently supported, to keep the patch at a reviewable size. Actual support for Longs/Floats will come in a later PR.
At a high level, type-specific handling for dimensions in the IncrementalIndex, IndexMerger, and querying paths has been extracted into a new interface. There are 4 associated classes with the new interface:
The javadoc comments within those classes provide further documentation on their usage/design.
StringDimensionHandler contains the implementations of these classes for String dimensions.
This patch also introduces query context "typeHints" for use by the GroupBy and TopN queries. The typehints are map of dimension name -> dimension ValueType. They are used to handle situations where a GroupBy or TopN has to merge results from two segments where the same dimension has two different types. The typeHints specify the type to merge the dimension values as (e.g. if a String and Float dimension with the same name are being merged, the typeHints will control whether the Strings are converted to Floats for merging, or vice versa).
The patch also changes the query engines to prepare them for accepting Long/Float dimensions later on.
Benchmark results, using the benchmarks introduced in #2875
EC2, c4.8xlarge, all tests use 100k rows/segment, 25 warmup iterations, 50 iterations
with dimension interface
Master
As discussed here, #2760 (diff), this patch changes the locking on the String dimension dictionary. Prior to this patch, the dimension value dictionaries during indexing synchronize on a common object, "dimensionDescs", the list of all dimension descriptors in IncrementalIndex. This patch changes the lock to a per-dimension object.
Benchmarks for a multithreaded groupby are show below (one thread per segment):
DimIntf, shared lock
DimIntf, individual dimension lock (what this PR uses)
Master