-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Change comparator expression to lazy evaluation #2853
Conversation
public int getSize() { return left.getSize(); } | ||
@Override | ||
public boolean isNullAt(int rowId) { | ||
return left.isNullAt(rowId) || right.isNullAt(rowId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ticket #2541 , it is mentioned that we want to check against the following truth table
Left | Right | Result |
---|---|---|
null | null | True |
not null (A) | null | False |
null | not null (B) | False |
not null (A) | not null (B) | False |
not null (A) | not null (A) | True |
But the current behavior is
Left | Right | Result |
---|---|---|
null | null | False |
not null (A) | null | False |
null | not null (B) | False |
not null (A) | not null (B) | False |
not null (A) | not null (A) | True |
Do we want to change the behavior? I am guessing we want to follow the truth table here https://spark.apache.org/docs/latest/sql-ref-null-semantics.html#:~:text=In%20order%20to%20compare%20the,both%20the%20operands%20are%20NULL%20. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow the existing method (i.e the second table). The example I gave in #2541 is for the null safe equals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, the existing code should follow the second table, once this PR is merged, I will add support for null safe equals.
If code reviewer is ok with the approach, I can patch the PR with cleanup of old code and add more documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzl-7 This looks great. Thank you for working on this. Few comments.
* Comparator interface for lazy evaluating comparator | ||
*/ | ||
@Evolving | ||
public interface VectorComparator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is applicable only for the default table client module. Please move this to kernel-defaults
module in package io.delta.kernel.defaults.internal.expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for pointing that out, fixed.
v = new VectorComparator() { | ||
@Override | ||
public boolean compare(int compareResult) { | ||
return compareResult == 0; | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be defined as a lambda (compareResult) -> (compareResult == 0)?
comparatorVector(leftResult, rightResult, (compareResult) -> (compareResult == 0))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, fixed.
public int getSize() { return left.getSize(); } | ||
@Override | ||
public boolean isNullAt(int rowId) { | ||
return left.isNullAt(rowId) || right.isNullAt(rowId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow the existing method (i.e the second table). The example I gave in #2541 is for the null safe equals.
@vkorukanti thanks for code reviewing this PR, I have addressed everything, can you take a look at it again? |
Hi @vkorukanti , is there any additional change that you would like me to do? If not can we merge the code? |
@zzl-7, sorry this is getting delayed. Will take a look at this week for sure. |
@vkorukanti , No worries, please let me know if there is anything I can do on my side to help :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzl-7 This is very close, thanks for working on this. Let me know if you have any qns.
if (isNullAt(rowId)) { | ||
return false; | ||
} | ||
DataType dataType = left.getDataType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VectorComparator
interface looks good.
Only this is that we now evaluate the this long if
statement for every row. Instead of we create a comparator operator for each type (e.g. boolean or short) we can avoid the if
statement cost.
class ValueComparator {
ValueComparator(ColumnVector v1, ColumnVector v2);
boolean getCompareResult(int rowId);
}
class ByteValueComparator {
ColumnVector v1;
ColumnVector v2;
ByteValueComparator(ColumnVector v1, ColumnVector v2) {
this.v1 = v1;
this.v2 = v2;
}
boolean getCompareResult(int rowId) {
// null check
// and then compare value for non-nulls
Byte.compare(left.getByte(rowId), right.getByte(rowId)))
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vkorukanti good catch, I didn't thought about that and should move the long if statement outside of anonymous class. I have fixed the suggestion, please let me know if this is what you are looking for
thanks
Hi @vkorukanti , just want to follow up on if there any additional change you would like me to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzl-7 Apologies for the delay. This PR is close to merging. Just a few comments to use the JDK provided interfaces/classes, which will simplify and remove many of the defined classes in the PR. Once those are addressed, then this PR is ready to merge.
ValueComparator valueComparator; | ||
if (dataType instanceof BooleanType) { | ||
compareBoolean(left, right, result); | ||
valueComparator = new BooleanValueComparator(left, right, vectorComparator); | ||
} else if (dataType instanceof ByteType) { | ||
compareByte(left, right, result); | ||
valueComparator = new ByteValueComparator(left, right, vectorComparator); | ||
} else if (dataType instanceof ShortType) { | ||
compareShort(left, right, result); | ||
valueComparator = new ShortValueComparator(left, right, vectorComparator); | ||
} else if (dataType instanceof IntegerType || dataType instanceof DateType) { | ||
compareInt(left, right, result); | ||
valueComparator = new IntegerValueComparator(left, right, vectorComparator); | ||
} else if (dataType instanceof LongType || | ||
dataType instanceof TimestampType || | ||
dataType instanceof TimestampNTZType) { | ||
compareLong(left, right, result); | ||
valueComparator = new LongValueComparator(left, right, vectorComparator); | ||
} else if (dataType instanceof FloatType) { | ||
compareFloat(left, right, result); | ||
valueComparator = new FloatValueComparator(left, right, vectorComparator); | ||
} else if (dataType instanceof DoubleType) { | ||
compareDouble(left, right, result); | ||
valueComparator = new DoubleValueComparator(left, right, vectorComparator); | ||
} else if (dataType instanceof DecimalType) { | ||
compareDecimal(left, right, result); | ||
valueComparator = new DecimalValueComparator(left, right, vectorComparator); | ||
} else if (dataType instanceof StringType) { | ||
compareString(left, right, result); | ||
valueComparator = new StringValueComparator(left, right, vectorComparator); | ||
} else if (dataType instanceof BinaryType) { | ||
compareBinary(left, right, result); | ||
valueComparator = new BinaryValueComparator(left, right, vectorComparator); | ||
} else { | ||
throw new UnsupportedOperationException(dataType + " can not be compared."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, we can actually define the classes as lambdas. That way we don't need to create a new file for each type.
Here is an example:
if (dataType instanceof DoubleType) {
valueComparator = rowId -> vectorComparator.compare(
Double.compare(left.getDouble(rowId), right.getDouble(rowId)));
} else if (dataType instanceof DecimalType) {
valueComparator = rowId -> vectorComparator.compare(
BIGDECIMAL_COMPARATOR.compare(left.getDecimal(rowId), right.getDecimal(rowId)));
}
You can delete all the ValueComparator
classes and the interface. Use IntPredicate
in place of the ValueComparator
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, thanks for pointing IntPredicate out, it made the code much cleaner
/** | ||
* Comparator interface for lazy evaluating comparator | ||
*/ | ||
@Evolving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can actually remove this and use the Java JDK provided IntPredicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is no longer used. Please remove it.
} | ||
return valueComparator.getCompareResult(rowId); | ||
} | ||
}; | ||
} | ||
|
||
static void compareBoolean(ColumnVector left, ColumnVector right, int[] result) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove all these methods right as they are no longer used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
import io.delta.kernel.defaults.internal.expressions.VectorComparator; | ||
|
||
public class StringValueComparator implements ValueComparator { | ||
static final Comparator<String> STRING_COMPARATOR = Comparator.naturalOrder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move constants like this into DefaultExpressionUtils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
static ColumnVector comparatorVector( | ||
ColumnVector left, | ||
ColumnVector right, | ||
VectorComparator vectorComparator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static ColumnVector comparatorVector( | |
ColumnVector left, | |
ColumnVector right, | |
VectorComparator vectorComparator) { | |
static ColumnVector comparatorVector( | |
ColumnVector left, | |
ColumnVector right, | |
IntPredicate vectorComparator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@vkorukanti |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzl-7 Just one last comment. The VectorComparator
is no longer used. Can you remove it? Let me know, I can also do it before merging it.
/** | ||
* Comparator interface for lazy evaluating comparator | ||
*/ | ||
@Evolving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is no longer used. Please remove it.
(forgot to comment while waiting for github action) |
Which Delta project/connector is this regarding?
Description
Resolves #2541
How was this patch tested?
Unit test in
delta/kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/internal/expressions/DefaultExpressionEvaluatorSuite.scala
Does this PR introduce any user-facing changes?