Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize earliest aggregator for both numeric and string types #14408

Merged
merged 26 commits into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
2b556f6
Vectorizing earliest for numeric
somu-imply Jun 12, 2023
59118ae
Vectorizing earliest string aggregator
somu-imply Jun 12, 2023
6c139de
checkstyle fix
somu-imply Jun 12, 2023
556b3de
Removing unnecessary exceptions
somu-imply Jun 13, 2023
cf6fe0f
Ignoring tests in MSQ as earliest is not supported for numeric there
somu-imply Jun 16, 2023
df3db6e
Merge remote-tracking branch 'upstream/master' into vectorize_earlies…
somu-imply Jun 16, 2023
cf88e00
Fixing benchmarks
somu-imply Jun 16, 2023
a9a6fc2
Updating tests as MSQ does not support earliest for some cases
somu-imply Jun 19, 2023
5f65c42
Merge remote-tracking branch 'upstream/master' into vectorize_earlies…
somu-imply Jul 7, 2023
f78ca05
Addressing review comments by adding the following:
somu-imply Jul 7, 2023
6cae490
Addressing issues for dictionary encoded single string columns where …
somu-imply Jul 13, 2023
ef87989
Adding a flag for multi value dimension selector
somu-imply Jul 13, 2023
4c5813d
Addressing comments
somu-imply Jul 19, 2023
67fce5f
1 more change
somu-imply Jul 19, 2023
ccfd600
Merge remote-tracking branch 'upstream/master' into vectorize_earlies…
somu-imply Aug 7, 2023
1c372af
Handling review comments part 1
somu-imply Aug 7, 2023
aa97181
Merge remote-tracking branch 'upstream/master' into vectorize_earlies…
somu-imply Aug 15, 2023
f585412
Handling review comments and correctness fix for latest_by when the t…
somu-imply Aug 16, 2023
4291709
Updating numeric first vector agg
somu-imply Aug 17, 2023
44c3a4c
Merge remote-tracking branch 'upstream/master' into vectorize_earlies…
somu-imply Aug 17, 2023
890c865
Revert "Updating numeric first vector agg"
somu-imply Aug 21, 2023
6e75540
Updating code for correctness issues
somu-imply Aug 21, 2023
3664b87
fixing an issue with latest agg
somu-imply Aug 22, 2023
83a784a
Adding more comments and removing an unnecessary check
somu-imply Aug 22, 2023
47762c4
Merge remote-tracking branch 'upstream/master' into vectorize_earlies…
somu-imply Aug 25, 2023
f4ddb7c
Addressing null checks for tie selector and only vectorize false for …
somu-imply Aug 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ public String getFormatString()
"SELECT TIME_SHIFT(MILLIS_TO_TIMESTAMP(long4), 'PT1H', 1), string2, SUM(long1 * double4) FROM foo GROUP BY 1,2 ORDER BY 3",
// 37: time shift + expr agg (group by), uniform distribution high cardinality
"SELECT TIME_SHIFT(MILLIS_TO_TIMESTAMP(long5), 'PT1H', 1), string2, SUM(long1 * double4) FROM foo GROUP BY 1,2 ORDER BY 3",
// 38: LATEST aggregator
// 38: LATEST aggregator long
"SELECT LATEST(long1) FROM foo",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fwiw these benchmarks were primarily meant for testing vectorized expression virtual columns, SqlBenchmark is the general purpose place for measuring stuff, that said these don't hurt being here and they have a bit less baggage than SqlBenchmark

// 39: LATEST aggregator double
"SELECT LATEST(double4) FROM foo",
Expand All @@ -207,7 +207,13 @@ public String getFormatString()
"SELECT LATEST(float3), LATEST(long1), LATEST(double4) FROM foo",
// 42,43: filter numeric nulls
"SELECT SUM(long5) FROM foo WHERE long5 IS NOT NULL",
"SELECT string2, SUM(long5) FROM foo WHERE long5 IS NOT NULL GROUP BY 1"
"SELECT string2, SUM(long5) FROM foo WHERE long5 IS NOT NULL GROUP BY 1",
// 44: EARLIEST aggregator long
"SELECT EARLIEST(long1) FROM foo",
// 45: EARLIEST aggregator double
"SELECT EARLIEST(double4) FROM foo",
// 46: EARLIEST aggregator float
"SELECT EARLIEST(float3) FROM foo"
);

@Param({"5000000"})
Expand Down Expand Up @@ -265,7 +271,11 @@ public String getFormatString()
"40",
"41",
"42",
"43"
"43",
"44",
"45",
"46",
"47"
})
private String query;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
}
],
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should only be necessary to not vectorize queries which have the quantiles doubles sketch aggregator that seems possibly broken, please don't mark all of them like this

"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -119,7 +120,8 @@
}
],
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -270,7 +272,8 @@
}
],
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -364,7 +367,8 @@
}
],
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -399,7 +403,8 @@
"metric": "rows",
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -514,7 +519,8 @@
"metric": "unique_users",
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -693,7 +699,8 @@
"metric": "count",
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -878,7 +885,8 @@
"metric": "count",
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -989,7 +997,8 @@
},
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1064,7 +1073,8 @@
},
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1128,7 +1138,8 @@
],
"dimensions": ["namespace"],
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1243,7 +1254,8 @@
"orderBy": ["robot", "namespace"]
},
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1342,7 +1354,8 @@
"value": "league_of_legends"
},
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1370,7 +1383,8 @@
"queryType": "timeBoundary",
"dataSource": "wikipedia_editstream",
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1459,7 +1473,8 @@
"metric": "rows",
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1519,7 +1534,8 @@
"metric": "rows",
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1583,7 +1599,8 @@
"metric": "rows",
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1672,7 +1689,8 @@
"limit": 3
},
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1724,7 +1742,8 @@
}
],
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1766,7 +1785,8 @@
}
],
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1827,7 +1847,8 @@
"metric": "rows",
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand All @@ -1847,7 +1868,8 @@
}
],
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -1884,7 +1906,8 @@
"metric": "rows",
"threshold": 3,
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand All @@ -1904,7 +1927,8 @@
}
],
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down Expand Up @@ -2003,7 +2027,8 @@
"limit": 3
},
"context": {
"useCache": "true",
"useCache": "true",
"vectorize": "false",
"populateCache": "true",
"timeout": 360000
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,21 @@
import org.apache.druid.query.aggregation.AggregatorFactory;
import org.apache.druid.query.aggregation.AggregatorUtil;
import org.apache.druid.query.aggregation.BufferAggregator;
import org.apache.druid.query.aggregation.VectorAggregator;
import org.apache.druid.query.aggregation.any.NumericNilVectorAggregator;
import org.apache.druid.query.cache.CacheKeyBuilder;
import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
import org.apache.druid.segment.BaseDoubleColumnValueSelector;
import org.apache.druid.segment.ColumnInspector;
import org.apache.druid.segment.ColumnSelectorFactory;
import org.apache.druid.segment.ColumnValueSelector;
import org.apache.druid.segment.NilColumnValueSelector;
import org.apache.druid.segment.column.ColumnCapabilities;
import org.apache.druid.segment.column.ColumnHolder;
import org.apache.druid.segment.column.ColumnType;
import org.apache.druid.segment.column.Types;
import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
import org.apache.druid.segment.vector.VectorValueSelector;

import javax.annotation.Nullable;
import java.nio.ByteBuffer;
Expand Down Expand Up @@ -97,6 +104,12 @@ public DoubleFirstAggregatorFactory(
this.storeDoubleAsFloat = ColumnHolder.storeDoubleAsFloat();
}

@Override
public boolean canVectorize(ColumnInspector columnInspector)
{
return true;
}

@Override
public Aggregator factorize(ColumnSelectorFactory metricFactory)
{
Expand Down Expand Up @@ -125,6 +138,21 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
}
}

@Override
public VectorAggregator factorizeVector(
VectorColumnSelectorFactory columnSelectorFactory
)
{
ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
if (Types.isNumeric(capabilities)) {
VectorValueSelector valueSelector = columnSelectorFactory.makeValueSelector(fieldName);
VectorValueSelector timeSelector = columnSelectorFactory.makeValueSelector(
timeColumn);
return new DoubleFirstVectorAggregator(timeSelector, valueSelector);
}
return NumericNilVectorAggregator.doubleNilVectorAggregator();
}

@Override
public Comparator getComparator()
{
Expand Down
Loading