Vectorize earliest aggregator for both numeric and string types #14408

somu-imply · 2023-06-12T11:41:35Z

The latest agg has been already vectorized. This PR vectorizes the earliest aggregator to ensure both these aggregators are vectorized. Benchmarks are run for the cases

44 -> long
45 -> double
46 -> float 

SqlExpressionBenchmark.querySql       44           5000000        false  avgt    5  38.656 ± 0.695  ms/op
SqlExpressionBenchmark.querySql       44           5000000        force  avgt    5  28.519 ± 1.110  ms/op
SqlExpressionBenchmark.querySql       45           5000000        false  avgt    5  38.667 ± 1.259  ms/op
SqlExpressionBenchmark.querySql       45           5000000        force  avgt    5  17.051 ± 0.523  ms/op
SqlExpressionBenchmark.querySql       46           5000000        false  avgt    5  38.579 ± 0.484  ms/op
SqlExpressionBenchmark.querySql       46           5000000        force  avgt    5  15.587 ± 0.766  ms/op

This PR has:

...sing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstVectorAggregator.java

…t_num

clintropolis · 2023-06-21T06:44:02Z

benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlExpressionBenchmark.java

@@ -205,7 +205,7 @@ public String getFormatString()
      "SELECT TIME_SHIFT(MILLIS_TO_TIMESTAMP(long4), 'PT1H', 1), string2, SUM(long1 * double4) FROM foo GROUP BY 1,2 ORDER BY 3",
      // 37: time shift + expr agg (group by), uniform distribution high cardinality
      "SELECT TIME_SHIFT(MILLIS_TO_TIMESTAMP(long5), 'PT1H', 1), string2, SUM(long1 * double4) FROM foo GROUP BY 1,2 ORDER BY 3",
-      // 38: LATEST aggregator
+      // 38: LATEST aggregator long
      "SELECT LATEST(long1) FROM foo",


nit: fwiw these benchmarks were primarily meant for testing vectorized expression virtual columns, SqlBenchmark is the general purpose place for measuring stuff, that said these don't hurt being here and they have a bit less baggage than SqlBenchmark

clintropolis · 2023-06-21T06:45:40Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java

+    //time is always long
+    BaseLongVectorValueSelector timeSelector = (BaseLongVectorValueSelector) columnSelectorFactory.makeValueSelector(
+        timeColumn);
+    if (capabilities == null || capabilities.isNumeric()) {


in the vectorized engine, capabilities being null means the column doesn't exist, and so you can use the nil aggregation i think?

clintropolis · 2023-06-21T06:58:16Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

+      if (timeVector[row] < firstTime) {
+        if (useDefault || nulls == null || !nulls[row]) {
+          updateTimeWithValue(buf, position, timeVector[row], row);
+        } else {
+          updateTimeWithNull(buf, position, timeVector[row]);
+        }


the docs seem to indicate that we pick the first non-null value, however looking at the non-vectorized aggregator it looks like we just pick the first value, which is also what we are doing here.

I guess allowing the native aggregator to pick the first value even if it is null is a bit more expressive than always ignoring null values, since we could always wrap this in a filtered aggregator (i vaguely remember having this exact discussion years ago for #9161), but otoh it doesn't seem like very typical behavior for SQL, which usually ignores null values for most aggregation functions. (the 'any' aggregator also behaves consistently with this and will return any value including null).

I wonder if we should either change the SQL conversion stuff to always wrap with a filtered agg to remove nulls, or modify the documentation to indicate that this function will return null values if the earliest row is null.

clintropolis · 2023-06-21T06:59:14Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

+    boolean[] nulls = useDefault ? null : valueSelector.getNullVector();
+    long[] timeVector = timeSelector.getLongVector();
+
+    for (int i = 0; i < numRows; i++) {


since this is a hot loop, it might be worth splitting up the two loops into 'has a null vector' and 'doesnt have a null vector' cases, though that's worth measuring to see if it makes a difference

clintropolis · 2023-06-21T07:01:52Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java

@@ -1374,12 +1367,46 @@ public void testStringAnyInSubquery()
    );
  }

+  @Test
+  public void testOffHeapEarliestGroupBy()


this seems already covered by other tests that removed 'skipVectorize' statements?

clintropolis · 2023-06-21T07:02:23Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java

@@ -14721,4 +14743,39 @@ public void testFilterWithNVLAndNotIn()
        )
    );
  }
+
+  @Test
+  public void testEarliestVectorAggregators()


same comment about maybe redundant test

clintropolis · 2023-06-21T07:05:18Z

...essing/src/main/java/org/apache/druid/query/aggregation/first/LongFirstVectorAggregator.java

+
+public class LongFirstVectorAggregator extends NumericFirstVectorAggregator
+{
+  long firstValue;


any reason these are fields instead of just a local variable?

clintropolis · 2023-06-21T07:09:18Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

+  }
+
+  /**
+   *Updates the time only to the appropriate position in buffer as the value is null


nit formatting (space after *)

imply-cheddar

Please remove Mockito from your tests. Just as a rule, if you ever run into Mockito when doing a change, take the time to remove it.

Also, we can do better optimizations by taking advantage of the dictionaries for the string versions, please implement those.

imply-cheddar · 2023-06-20T04:14:07Z

...sing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstVectorAggregator.java

+
+
+  /**
+   * @return The primitive object stored at the position in the buffer.


This comment says that it's returning a primitive, but the method is returning a SerializablePair. Which one is supposed to be correct?

imply-cheddar · 2023-06-20T04:15:52Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java

+    VectorValueSelector valueSelector = columnSelectorFactory.makeValueSelector(fieldName);
+    //time is always long
+    BaseLongVectorValueSelector timeSelector = (BaseLongVectorValueSelector) columnSelectorFactory.makeValueSelector(
+        timeColumn);


Two things:

you don't need either of these until after you've checked capabilities. Don't bother creating them if you don't need them.

This is casting to BaseLongVectorValueSelector, but the arguments on DoubleFirstVectorAggregator don't seem to care about the cast at all. Either it's important that we cast and we force the case, OR it's not important and we shouldn't force the case. The current code makes me think that it's not important.

imply-cheddar · 2023-06-21T07:47:54Z

...sing/src/main/java/org/apache/druid/query/aggregation/first/FloatFirstAggregatorFactory.java

+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    VectorValueSelector valueSelector = columnSelectorFactory.makeValueSelector(fieldName);
+    //time is always long
+    BaseLongVectorValueSelector timeSelector = (BaseLongVectorValueSelector) columnSelectorFactory.makeValueSelector(
+        timeColumn);
+    if (capabilities == null || capabilities.isNumeric()) {
+      return new FloatFirstVectorAggregator(timeSelector, valueSelector);
+    } else {
+      return NumericNilVectorAggregator.floatNilVectorAggregator();
+    }


This looks like the Double one which I had comments on, please apply here too

imply-cheddar · 2023-06-21T08:00:35Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstAggregatorFactory.java

+    ColumnCapabilities capabilities = selectorFactory.getColumnCapabilities(fieldName);
+    VectorObjectSelector vSelector = selectorFactory.makeObjectSelector(fieldName);
+    BaseLongVectorValueSelector timeSelector = (BaseLongVectorValueSelector) selectorFactory.makeValueSelector(
+        timeColumn);
+    if (capabilities != null) {
+      return new StringFirstVectorAggregator(timeSelector, vSelector, maxStringBytes);
+    } else {
+      return new StringFirstVectorAggregator(null, vSelector, maxStringBytes);
+    }


We can/should do this a bit more intelligently. Specifically, there are 3 different types of vector selectors that could be needed here and you will need to check column capabilities ahead of time to tell the difference:

If it is a STRING and multi-valued, use the multivalue-dimension version

If it is a STRING and single-valued, use the single value dimension version

Otherwise use a VectorObjectSelector

Your implementation for (3) is in this PR already, for (1) and (2), you can read only the dictionary ids and just keep track of only the earliest dictionaryId (not the string, the dictionary id). Then, when get() is called, convert the dictionary id into the String and truncate the size if necessary.

imply-cheddar · 2023-06-21T08:04:04Z

...sing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstVectorAggregator.java

+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  //protected long firstTime;


commented code alert

imply-cheddar · 2023-06-21T08:12:27Z

...src/test/java/org/apache/druid/query/aggregation/first/DoubleFirstVectorAggregationTest.java

+import java.nio.ByteBuffer;
+import java.util.concurrent.ThreadLocalRandom;
+
+@RunWith(MockitoJUnitRunner.class)


Please re-write this to not use Mockito.

imply-cheddar · 2023-06-21T08:14:08Z

...src/test/java/org/apache/druid/query/aggregation/first/DoubleFirstVectorAggregationTest.java

+  @Mock
+  private VectorValueSelector selector;
+  @Mock
+  private BaseLongVectorValueSelector timeSelector;


These are both interfaces, if there don't already exist test-oriented implementations of these interfaces, please create them instead of mocking things.

Mockito needs to be killed from the codebase, it should not be used.

The tests will always be easier to understand and debug if there is a test class implementation of the interface instead of using mocks.

…t_num

somu-imply · 2023-07-07T18:24:47Z

Thanks for the comments, on the path to remove Mockito and address the optimizations as suggested

1. Checking capabilities first before creating selectors 2. Removing mockito in tests for numeric first aggs 3. Removing unnecessary tests

…we can use the dictionary ids instead of the entire string

clintropolis · 2023-07-17T20:03:56Z

processing/src/main/java/org/apache/druid/query/UnnestDataSource.java

+    // select * from UNNEST(ARRAY[1,2,3]) as somu(d3) where somu.d3 IN ('a','b')
+    this.base = dataSource; // table
+    this.virtualColumn = virtualColumn; // MV_TO_ARRAY
+    this.unnestFilter = unnestFilter; // d3 in (a,b)


nit: these comments seem strange, did you mean to leave them here? Also unrelated to this PR?

clintropolis · 2023-07-17T20:13:28Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java

+  )
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (capabilities.isNumeric()) {


i think you need to check for capabilities being null too, you should be able to confirm this by having a test for a column that doesn't exist (which is what vector engine returns for capabilities if column is missing)

clintropolis · 2023-07-17T20:13:44Z

...sing/src/main/java/org/apache/druid/query/aggregation/first/FloatFirstAggregatorFactory.java

+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory columnSelectorFactory)
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (capabilities.isNumeric()) {


same comment re null check

clintropolis · 2023-07-17T20:14:22Z

...ssing/src/main/java/org/apache/druid/query/aggregation/first/LongFirstAggregatorFactory.java

+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory columnSelectorFactory)
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (capabilities.isNumeric()) {


ditto null check

clintropolis · 2023-07-17T20:19:19Z

...java/org/apache/druid/query/aggregation/first/MultiStringFirstDimensionVectorAggregator.java

+    firstTime = buf.getLong(position);
+    int index = startRow;
+    for (int i = startRow; i < endRow; i++) {
+      if (valueVector[i].get(0) != 0) {


is this trying to check for null? if so you need to actually check that value 0 is null and not just the first value of the dictionary. if not, could you leave a comment about what is going on here?

clintropolis · 2023-07-18T00:48:51Z

...java/org/apache/druid/query/aggregation/first/MultiStringFirstDimensionVectorAggregator.java

+      if (timeVector[row] < firstTime) {
+        firstTime = timeVector[row];
+        buf.putLong(position, firstTime);
+        buf.put(position + NumericFirstVectorAggregator.NULL_OFFSET, NullHandling.IS_NOT_NULL_BYTE);


shouldn't this be checking for the value being null or not? or is the assumption that we never set the null bit here and instead translate it in the get method? If that is the case, why do we need a null byte at all instead of just storing a long and an int in the buffer? Or is it to distinguish the case between 'aggregate' not being called from actually aggregating something? (e.g. an empty group should probably always spit out a null value...)

clintropolis · 2023-07-18T00:50:33Z

...java/org/apache/druid/query/aggregation/first/MultiStringFirstDimensionVectorAggregator.java

+    int index = buf.getInt(position + NumericFirstVectorAggregator.VALUE_OFFSET);
+    long earliest = buf.getLong(position);
+    String strValue = valueDimensionVectorSelector.lookupName(index);
+    return new SerializablePairLongString(earliest, StringUtils.chop(strValue, maxStringBytes));


can this be wrong in the case where nothing was aggregated and id 0 in the dictionary is not null? it seems like we need to check the null byte here and return null if the null byte is set to null (since otherwise it appears as if it will be set to not null)

clintropolis · 2023-07-18T00:52:32Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

+      if (useDefault || nullValueVector == null || !nullValueVector[index]) {
+        updateTimeWithValue(buf, position, firstTime, index);
+      } else {
+        updateTimeWithNull(buf, position, firstTime);
+      }


this is somewhat confusing, the other loop is breaking if it finds a non-null value, but here we can still write a null i guess if it made it through the whole vector without breaking? That seems odd since it means that it finds the first non-null aggregator in a vector, else it finds the last timestamp in the first vector it reads?

clintropolis · 2023-07-18T00:53:51Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

+    buf.putLong(position, time);
+    buf.put(position + NULL_OFFSET, NullHandling.IS_NOT_NULL_BYTE);
+    putValue(buf, position + VALUE_OFFSET, index);
+  }


some thoughts - since the value portion of this is basically the same behavior of NullableTypeStrategy where there is a byte to track nulls and then the actual value bytes, I can't help but wonder if we could share some more code between all of the first/last aggregators by letting them use a NullableTypeStrategy for whatever the underlying selector type is. This definitely doesn't need to be done in this PR, just thinking ahead for if we supported additional types like arrays.

clintropolis · 2023-07-18T00:56:44Z

...src/test/java/org/apache/druid/query/aggregation/first/DoubleFirstVectorAggregationTest.java

+import java.nio.ByteBuffer;
+import java.util.concurrent.ThreadLocalRandom;
+
+public class DoubleFirstVectorAggregationTest extends InitializedNullHandlingTest


these tests seem to have a flaw in that they only test one vector? I don't have an example handy, but it seems like it would be nicer if the tests used a cursor/offset and advanced through all of the rows to provide a more realistic test case.

somu-imply · 2023-07-18T01:26:33Z

@clintropolis I had some confusion regarding the use of flag. It will be removed and the PR will be updated. Working on it and the other comments too

clintropolis

the query integration test failures look possibly related:

Expected: [{timestamp=2013-01-01T00:00:00.000Z, result={added=9.11526338E8, count=2815650, firstAdded=39.0, lastAdded=210.0, firstCount=1, lastCount=1, quantilesDoublesSketch=2390950, approxCountTheta=219483.4076460526, approxCountHLL=216700, delta=5.48967603E8, variation=1.274085073E9, delta_hist={breaks=[-2634692.25, -2048505.0, -1462317.75, -876130.4375, -289943.125, 296244.1875, 882431.5, 1468619.0], counts=[1.0, 2.0, 1.0, 56.0, 2815544.0, 41.0, 5.0]}, unique_users=229361.39005604674, deleted=-3.62558735E8, rows=2390950}}],

Actual: [{timestamp=2013-01-01T00:00:00.000Z, result={firstCount=1, added=9.11526338E8, count=2815650, delta=5.48967603E8, lastCount=1, rows=2390950, firstAdded=39.0, variation=1.274085073E9, unique_users=229361.39005604674, deleted=-3.62558735E8, quantilesDoublesSketch=0, approxCountTheta=219483.4076460526, approxCountHLL=216700, lastAdded=210.0, delta_hist={breaks=[-2634692.25, -2048505.0, -1462317.75, -876130.4375, -289943.125, 296244.1875, 882431.5, 1468619.0], counts=[1.0, 2.0, 1.0, 56.0, 2815544.0, 41.0, 5.0]}}}]

specifically: expected quantilesDoublesSketch=2390950 but actual is quantilesDoublesSketch=0, which might be a bug exposed by the query becoming vectorizable? We should look into what is happening here. We don't necessarily need to fix it in this PR, but it either needs fixed or these integration tests need to set the query context to not vectorize (assuming it is related to vectorization) so that the results don't change.

clintropolis · 2023-08-01T03:27:20Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java

+  )
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (capabilities != null && capabilities.isNumeric()) {


nit: can use Types.isNumeric(capabilities) https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/Types.java#L120

clintropolis · 2023-08-03T19:12:03Z

...ssing/src/main/java/org/apache/druid/query/aggregation/first/LongFirstAggregatorFactory.java

+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory columnSelectorFactory)
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (capabilities != null && capabilities.isNumeric()) {


nit: can use Types.isNumeric

clintropolis · 2023-08-03T19:12:14Z

...sing/src/main/java/org/apache/druid/query/aggregation/first/FloatFirstAggregatorFactory.java

+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory columnSelectorFactory)
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (capabilities != null && capabilities.isNumeric()) {


nit: can use Types.isNumeric

clintropolis · 2023-08-03T19:20:47Z

...sing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstVectorAggregator.java

+    // iterate once over the object vector to find first non null element and
+    // determine if the type is Pair or not
+    boolean foldNeeded = false;
+    for (Object obj : objectsWhichMightBeStrings) {


i forget, are selectors that spit out SerializablePairLongString always not null values? if not, do we need to check that we actually found something inside of the loop that wasn't null? im thinking of the case of when the column is sparse and has lots of nulls, and the whole vector for this aggregate call is all nulls. I guess it ends up in the not-folding case, which is ok if the serializable pairs are never null

clintropolis · 2023-08-03T19:21:35Z

...src/test/java/org/apache/druid/query/aggregation/first/DoubleFirstVectorAggregationTest.java

+      @Override
+      public boolean[] getNullVector()
+      {
+        return NULLS;


nit: the real time column will never have null values afaik

…t_num

clintropolis · 2023-08-07T23:41:48Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java

+  )
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (capabilities != null && Types.isNumeric(capabilities)) {


oops sorry for the confusion, Types.isNumeric includes the null check on capabilities so it isn't needed here

clintropolis · 2023-08-07T23:45:27Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

+    // the time vector is already sorted so the first element would be the earliest
+    // traverse accordingly


Sorry for not realizing this ... earlier(!), but actually this is only true when the timeSelector is for the __time column. If this aggregator is being used for LATEST_BY with some expression virtual column then this assumption is not correct, since the time values could be produced from any column which may or may not be sorted (and which might also have nulls, so we probably also need to check null vector of the time selector).

It might be worth splitting out the implementation for earliest and earliest_by since the sorted __time column is probably a decent optimization in that specific case.

…t_num

…ime expression need not be in sorted order

…t_num

This reverts commit 4291709.

clintropolis · 2023-08-24T04:57:32Z

integration-tests/src/test/resources/queries/wikipedia_editstream_queries.json

+                "useCache": "true", 
+                "vectorize": "false",


it should only be necessary to not vectorize queries which have the quantiles doubles sketch aggregator that seems possibly broken, please don't mark all of them like this

clintropolis · 2023-08-24T21:37:41Z

...ing/src/main/java/org/apache/druid/query/aggregation/first/NumericFirstVectorAggregator.java

+    for (int i = startRow; i < endRow; i++) {
+      index = i;
+      if (nullTimeVector != null && nullTimeVector[index]) {
+        continue;


how does this work if all of the time values are null? The docs seem to indicate that if the time column is null we just take the first value, but i'm not completely sure what that means (maybe the docs are wrong?). I wonder if we should we maybe treat rows with a null timestamp as if the timestamp were Long.MAX_VALUE and update the value? Though, I guess in that case we would have trouble distinguishing null time and null row from the initialized state. I suppose we could be consistent if we made a more general behavior that if there are multiple values with the same timestamp we take the first non-null value we encounter, though I suspect that would require change with the non-vector aggs too.

So currently in case of non-null timestamps, we just take the first value. We iterate over the values and only update the first time if the current time is less than the earliest time. The issue arises when all timestamps are null. Considering the earliest where the time selector is __time, the chances of this happening are more when used on a secondary timestamp through earliest_by. In such a case, should we even return any results ? The docs point that

If expr comes from a relation with a timestamp column (like __time in a Druid datasource), the "earliest" is taken from the row with the overall earliest non-null value of the timestamp column.

clintropolis · 2023-08-24T21:38:54Z

...ava/org/apache/druid/query/aggregation/first/SingleStringFirstDimensionVectorAggregator.java

+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    final long[] timeVector = timeSelector.getLongVector();


i think we need to check the null vector here for the timeSelector

Addressed these

clintropolis · 2023-08-24T21:39:03Z

...ava/org/apache/druid/query/aggregation/first/SingleStringFirstDimensionVectorAggregator.java

+  @Override
+  public void aggregate(ByteBuffer buf, int numRows, int[] positions, @Nullable int[] rows, int positionOffset)
+  {
+    long[] timeVector = timeSelector.getLongVector();


same comment about null vector for time selector

clintropolis · 2023-08-24T21:40:30Z

...sing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstVectorAggregator.java

+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();


same comment about checking null vector of timeSelector

clintropolis · 2023-08-24T21:40:59Z

...sing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstVectorAggregator.java

+  @Override
+  public void aggregate(ByteBuffer buf, int numRows, int[] positions, @Nullable int[] rows, int positionOffset)
+  {
+    long[] timeVector = timeSelector.getLongVector();


same comment about checking null vector of timeSelector

…t_num

…quantile sketches

clintropolis

i think we should consider splitting the string aggregator from the pair aggregator in the future since it should make the handling of both a lot cleaner

soumyava · 2023-09-05T15:42:17Z

Will open a separate PR to go down two different paths for pair and not pair

somu-imply added 3 commits June 12, 2023 16:12

Vectorizing earliest for numeric

2b556f6

Vectorizing earliest string aggregator

59118ae

checkstyle fix

6c139de

somu-imply mentioned this pull request Jun 12, 2023

Vectorize the numeric part of Earliest Aggregator #12483

Closed

9 tasks

github-advanced-security bot found potential problems Jun 12, 2023

View reviewed changes

...sing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstVectorAggregator.java Fixed Show fixed Hide fixed

somu-imply added 5 commits June 13, 2023 07:29

Removing unnecessary exceptions

556b3de

Ignoring tests in MSQ as earliest is not supported for numeric there

cf6fe0f

Merge remote-tracking branch 'upstream/master' into vectorize_earlies…

df3db6e

…t_num

Fixing benchmarks

cf88e00

Updating tests as MSQ does not support earliest for some cases

a9a6fc2

somu-imply marked this pull request as ready for review June 19, 2023 04:59

clintropolis added Performance Area - Querying labels Jun 21, 2023

clintropolis reviewed Jun 21, 2023

View reviewed changes

imply-cheddar suggested changes Jun 21, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/master' into vectorize_earlies…

5f65c42

…t_num

somu-imply added 3 commits July 7, 2023 13:08

Addressing review comments by adding the following:

f78ca05

1. Checking capabilities first before creating selectors 2. Removing mockito in tests for numeric first aggs 3. Removing unnecessary tests

Addressing issues for dictionary encoded single string columns where …

6cae490

…we can use the dictionary ids instead of the entire string

Adding a flag for multi value dimension selector

ef87989

clintropolis reviewed Jul 18, 2023

View reviewed changes

somu-imply added 2 commits July 19, 2023 12:02

Addressing comments

4c5813d

1 more change

67fce5f

clintropolis reviewed Aug 3, 2023

View reviewed changes

somu-imply added 2 commits August 7, 2023 09:51

Merge remote-tracking branch 'upstream/master' into vectorize_earlies…

ccfd600

…t_num

Handling review comments part 1

1c372af

clintropolis reviewed Aug 7, 2023

View reviewed changes

somu-imply added 2 commits August 15, 2023 08:39

Merge remote-tracking branch 'upstream/master' into vectorize_earlies…

aa97181

…t_num

Handling review comments and correctness fix for latest_by when the t…

f585412

…ime expression need not be in sorted order

somu-imply added 6 commits August 16, 2023 17:06

Updating numeric first vector agg

4291709

Merge remote-tracking branch 'upstream/master' into vectorize_earlies…

44c3a4c

…t_num

Revert "Updating numeric first vector agg"

890c865

This reverts commit 4291709.

Updating code for correctness issues

6e75540

fixing an issue with latest agg

3664b87

Adding more comments and removing an unnecessary check

83a784a

clintropolis reviewed Aug 24, 2023

View reviewed changes

somu-imply added 2 commits August 24, 2023 17:12

Merge remote-tracking branch 'upstream/master' into vectorize_earlies…

47762c4

…t_num

Addressing null checks for tie selector and only vectorize false for …

f4ddb7c

…quantile sketches

clintropolis approved these changes Aug 30, 2023

View reviewed changes

soumyava merged commit 8088a76 into apache:master Sep 5, 2023
74 checks passed

LakshSingla added this to the 28.0 milestone Oct 12, 2023

LakshSingla mentioned this pull request Nov 4, 2023

[DRAFT] 28.0.0 release notes #15326

Closed



		/**
		* @return The primitive object stored at the position in the buffer.

		// the time vector is already sorted so the first element would be the earliest
		// traverse accordingly

Vectorize earliest aggregator for both numeric and string types #14408

Vectorize earliest aggregator for both numeric and string types #14408

Conversation

somu-imply commented Jun 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imply-cheddar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

somu-imply commented Jul 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

somu-imply commented Jul 18, 2023

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis Aug 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

somu-imply Aug 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

soumyava commented Sep 5, 2023

somu-imply commented Jun 12, 2023 •

edited

Loading

clintropolis Aug 3, 2023 •

edited

Loading

clintropolis Aug 24, 2023 •

edited

Loading

somu-imply Aug 25, 2023 •

edited

Loading