Early terminate orderby when columns already sorted #8228

jt-kloudfuse · 2022-02-18T17:26:02Z

Description

For a query like, Order by col1, col2 limit k will take all the n matching rows and add it to a priority queue of size k.
This is nlogk operation which can be quite expensive for a large n.
In the above example, if the docs are already sorted by col1, col2 then there is no need for any
sorting at all (only limit is needed).

This PR basically tracks if left most columns are sorted and if they all are, then short circuits the nlogk loop.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

[ No]

Does this PR fix a zero-downtime upgrade introduced earlier?

[ No]

Does this PR otherwise need attention when creating release notes? Things to consider:

New configuration options
Deprecation of configurations
Signature changes to public methods/interfaces
New plugins added or old plugins removed

[ No]

Release Notes

Documentation

pinot-core/src/main/java/org/apache/pinot/core/operator/query/SelectionOrderByOperator.java

codecov-commenter · 2022-02-19T01:39:14Z

Codecov Report

Merging #8228 (294b07b) into master (582d621) will increase coverage by 6.48%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #8228      +/-   ##
============================================
+ Coverage     64.28%   70.77%   +6.48%     
+ Complexity     4312     4240      -72     
============================================
  Files          1581     1629      +48     
  Lines         83122    85200    +2078     
  Branches      12587    12827     +240     
============================================
+ Hits          53436    60297    +6861     
+ Misses        25852    20743    -5109     
- Partials       3834     4160     +326

Flag	Coverage Δ
integration1	`28.90% <100.00%> (?)`
integration2	`27.51% <100.00%> (?)`
unittests1	`66.96% <100.00%> (-0.38%)`	⬇️
unittests2	`14.09% <0.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
.../core/operator/query/SelectionOrderByOperator.java	`94.30% <100.00%> (+3.73%)`	⬆️
.../org/apache/pinot/core/plan/SelectionPlanNode.java	`100.00% <100.00%> (ø)`
...ls/nativefst/automaton/MinimizationOperations.java	`0.00% <0.00%> (-36.20%)`	⬇️
...tils/nativefst/automaton/TransitionComparator.java	`9.37% <0.00%> (-25.00%)`	⬇️
...ent/local/utils/nativefst/automaton/Automaton.java	`57.08% <0.00%> (-16.86%)`	⬇️
...gment/spi/partition/HashCodePartitionFunction.java	`66.66% <0.00%> (-11.12%)`	⬇️
...cal/utils/nativefst/automaton/BasicOperations.java	`28.57% <0.00%> (-10.85%)`	⬇️
...segment/local/utils/nativefst/automaton/State.java	`36.58% <0.00%> (-9.76%)`	⬇️
.../plugin/inputformat/thrift/ThriftRecordReader.java	`86.95% <0.00%> (-3.75%)`	⬇️
.../java/org/apache/pinot/spi/data/TimeFieldSpec.java	`88.63% <0.00%> (-2.28%)`	⬇️
... and 404 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 582d621...294b07b. Read the comment docs.

Jackie-Jiang

Since we are only optimizing the case of all order-by columns are sorted, we can have the check within getNextBlock() and add a new method computeAllSorted() to only add the first limit records into the priority queue. Mixing the logic with existing computation will introduce extra overhead, and hard to read

jt-kloudfuse · 2022-02-19T07:16:13Z

Since we are only optimizing the case of all order-by columns are sorted, we can have the check within getNextBlock() and add a new method computeAllSorted() to only add the first limit records into the priority queue. Mixing the logic with existing computation will introduce extra overhead, and hard to read

I did consider doing that.

At the moment, getNexBlock has a conditional split for all/partially ordered columns (where computeAllOrdered is where all columns in select are also in the order by if I understand correctly )

If we introduce a case for when all orderBy columns are pre-sorted (early termination case) then wouldn't we again have a allOrdered/partialOrdered case within that as well?

Jackie-Jiang · 2022-02-22T01:28:43Z

At the moment, getNexBlock has a conditional split for all/partially ordered columns (where computeAllOrdered is where all columns in select are also in the order by if I understand correctly )

If we introduce a case for when all orderBy columns are pre-sorted (early termination case) then wouldn't we again have a allOrdered/partialOrdered case within that as well?

No. If all the order-by columns are sorted, we can simply pick the first limit docs without worrying about the optimization of partialOrdered case.

jt-kloudfuse · 2022-02-22T01:46:31Z

At the moment, getNexBlock has a conditional split for all/partially ordered columns (where computeAllOrdered is where all columns in select are also in the order by if I understand correctly )
If we introduce a case for when all orderBy columns are pre-sorted (early termination case) then wouldn't we again have a allOrdered/partialOrdered case within that as well?

No. If all the order-by columns are sorted, we can simply pick the first limit docs without worrying about the optimization of partialOrdered case.

May be we can get a slack call to resolve this quickly.

The code "without this PR" looks like below

    if (_expressions.size() == _orderByExpressions.size()) {
      return computeAllOrdered();
    } else {
      return computePartiallyOrdered();
    }

Here computeAllOrdered is where all columns in projection are in order by clause. And, computePartiallyOrdered is where projection has more columns that in order by clause.

The two cases will also apply to the early terminate case. So with this optimization we will have 4 cases.

computeAllOrderedWithEarlyTermination
computePartiallyOrderedWithEarlyTermination
computeAllOrdered
computePartiallyOrdered

I am not sure this is any more readable that what I have.

Jackie-Jiang · 2022-02-22T02:18:52Z

At the moment, getNexBlock has a conditional split for all/partially ordered columns (where computeAllOrdered is where all columns in select are also in the order by if I understand correctly )
If we introduce a case for when all orderBy columns are pre-sorted (early termination case) then wouldn't we again have a allOrdered/partialOrdered case within that as well?

No. If all the order-by columns are sorted, we can simply pick the first limit docs without worrying about the optimization of partialOrdered case.

May be we can get a slack call to resolve this quickly.

The code "without this PR" looks like below
    if (_expressions.size() == _orderByExpressions.size()) {
      return computeAllOrdered();
    } else {
      return computePartiallyOrdered();
    }
Here computeAllOrdered is where all columns in projection are in order by clause. And, computePartiallyOrdered is where projection has more columns that in order by clause.

The two cases will also apply to the early terminate case. So with this optimization we will have 4 cases.

computeAllOrderedWithEarlyTermination

computePartiallyOrderedWithEarlyTermination

computeAllOrdered

computePartiallyOrdered

I am not sure this is any more readable that what I have.

What I am suggesting is something like:

    if (allOrderByColumnsSorted()) {
      return computeAllSorted();
    } else if (_expressions.size() == _orderByExpressions.size()) {
      return computeAllOrdered();
    } else {
      return computePartiallyOrdered();
    }

If all order-by columns are sorted, we don't need to do the partially ordered optimization, and the logic is all separated which is more clear and can avoid unnecessary overhead