Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Change comparator expression to lazy evaluation #2853

Merged
merged 24 commits into from
May 30, 2024

Conversation

zzl-7
Copy link
Contributor

@zzl-7 zzl-7 commented Apr 5, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Resolves #2541

How was this patch tested?

Unit test in delta/kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/internal/expressions/DefaultExpressionEvaluatorSuite.scala

Does this PR introduce any user-facing changes?

public int getSize() { return left.getSize(); }
@Override
public boolean isNullAt(int rowId) {
return left.isNullAt(rowId) || right.isNullAt(rowId);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ticket #2541 , it is mentioned that we want to check against the following truth table

Left Right Result
null null True
not null (A) null False
null not null (B) False
not null (A) not null (B) False
not null (A) not null (A) True

But the current behavior is

Left Right Result
null null False
not null (A) null False
null not null (B) False
not null (A) not null (B) False
not null (A) not null (A) True

Do we want to change the behavior? I am guessing we want to follow the truth table here https://spark.apache.org/docs/latest/sql-ref-null-semantics.html#:~:text=In%20order%20to%20compare%20the,both%20the%20operands%20are%20NULL%20. ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the existing method (i.e the second table). The example I gave in #2541 is for the null safe equals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, the existing code should follow the second table, once this PR is merged, I will add support for null safe equals.

@zzl-7
Copy link
Contributor Author

zzl-7 commented Apr 5, 2024

If code reviewer is ok with the approach, I can patch the PR with cleanup of old code and add more documentation
Thank you

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzl-7 This looks great. Thank you for working on this. Few comments.

* Comparator interface for lazy evaluating comparator
*/
@Evolving
public interface VectorComparator {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is applicable only for the default table client module. Please move this to kernel-defaults module in package io.delta.kernel.defaults.internal.expressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing that out, fixed.

Comment on lines 409 to 414
v = new VectorComparator() {
@Override
public boolean compare(int compareResult) {
return compareResult == 0;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be defined as a lambda (compareResult) -> (compareResult == 0)?

comparatorVector(leftResult, rightResult,  (compareResult) -> (compareResult == 0))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, fixed.

public int getSize() { return left.getSize(); }
@Override
public boolean isNullAt(int rowId) {
return left.isNullAt(rowId) || right.isNullAt(rowId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the existing method (i.e the second table). The example I gave in #2541 is for the null safe equals.

@vkorukanti vkorukanti added this to the 3.2.0 milestone Apr 8, 2024
@zzl-7
Copy link
Contributor Author

zzl-7 commented Apr 9, 2024

@vkorukanti thanks for code reviewing this PR, I have addressed everything, can you take a look at it again?
Thank you

@vkorukanti vkorukanti changed the title Change comparator expression to lazy evaluation [Kernel] Change comparator expression to lazy evaluation Apr 10, 2024
@zzl-7
Copy link
Contributor Author

zzl-7 commented Apr 23, 2024

Hi @vkorukanti , is there any additional change that you would like me to do? If not can we merge the code?
Thanks

@vkorukanti
Copy link
Collaborator

@zzl-7, sorry this is getting delayed. Will take a look at this week for sure.

@zzl-7
Copy link
Contributor Author

zzl-7 commented Apr 24, 2024

@zzl-7, sorry this is getting delayed. Will take a look at this week for sure.

@vkorukanti , No worries, please let me know if there is anything I can do on my side to help :)
Thanks

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzl-7 This is very close, thanks for working on this. Let me know if you have any qns.

if (isNullAt(rowId)) {
return false;
}
DataType dataType = left.getDataType();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VectorComparator interface looks good.

Only this is that we now evaluate the this long if statement for every row. Instead of we create a comparator operator for each type (e.g. boolean or short) we can avoid the if statement cost.

class ValueComparator {
   ValueComparator(ColumnVector v1, ColumnVector v2); 

    boolean getCompareResult(int rowId);
}

class ByteValueComparator {
   ColumnVector v1;
   ColumnVector v2;

   ByteValueComparator(ColumnVector v1, ColumnVector v2) {
      this.v1 = v1;
      this.v2 = v2;
   }
   boolean getCompareResult(int rowId) {
      // null check 
     // and then compare value for non-nulls
       Byte.compare(left.getByte(rowId), right.getByte(rowId)))
   }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vkorukanti good catch, I didn't thought about that and should move the long if statement outside of anonymous class. I have fixed the suggestion, please let me know if this is what you are looking for
thanks

@zzl-7
Copy link
Contributor Author

zzl-7 commented May 16, 2024

Hi @vkorukanti , just want to follow up on if there any additional change you would like me to do
Thanks :)

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzl-7 Apologies for the delay. This PR is close to merging. Just a few comments to use the JDK provided interfaces/classes, which will simplify and remove many of the defined classes in the PR. Once those are addressed, then this PR is ready to merge.

Comment on lines 108 to 133
ValueComparator valueComparator;
if (dataType instanceof BooleanType) {
compareBoolean(left, right, result);
valueComparator = new BooleanValueComparator(left, right, vectorComparator);
} else if (dataType instanceof ByteType) {
compareByte(left, right, result);
valueComparator = new ByteValueComparator(left, right, vectorComparator);
} else if (dataType instanceof ShortType) {
compareShort(left, right, result);
valueComparator = new ShortValueComparator(left, right, vectorComparator);
} else if (dataType instanceof IntegerType || dataType instanceof DateType) {
compareInt(left, right, result);
valueComparator = new IntegerValueComparator(left, right, vectorComparator);
} else if (dataType instanceof LongType ||
dataType instanceof TimestampType ||
dataType instanceof TimestampNTZType) {
compareLong(left, right, result);
valueComparator = new LongValueComparator(left, right, vectorComparator);
} else if (dataType instanceof FloatType) {
compareFloat(left, right, result);
valueComparator = new FloatValueComparator(left, right, vectorComparator);
} else if (dataType instanceof DoubleType) {
compareDouble(left, right, result);
valueComparator = new DoubleValueComparator(left, right, vectorComparator);
} else if (dataType instanceof DecimalType) {
compareDecimal(left, right, result);
valueComparator = new DecimalValueComparator(left, right, vectorComparator);
} else if (dataType instanceof StringType) {
compareString(left, right, result);
valueComparator = new StringValueComparator(left, right, vectorComparator);
} else if (dataType instanceof BinaryType) {
compareBinary(left, right, result);
valueComparator = new BinaryValueComparator(left, right, vectorComparator);
} else {
throw new UnsupportedOperationException(dataType + " can not be compared.");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, we can actually define the classes as lambdas. That way we don't need to create a new file for each type.

Here is an example:

if (dataType instanceof DoubleType) {
            valueComparator = rowId -> vectorComparator.compare(
                    Double.compare(left.getDouble(rowId), right.getDouble(rowId)));
        } else if (dataType instanceof DecimalType) {
            valueComparator = rowId -> vectorComparator.compare(
                    BIGDECIMAL_COMPARATOR.compare(left.getDecimal(rowId), right.getDecimal(rowId)));
        }

You can delete all the ValueComparator classes and the interface. Use IntPredicate in place of the ValueComparator interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, thanks for pointing IntPredicate out, it made the code much cleaner

/**
* Comparator interface for lazy evaluating comparator
*/
@Evolving
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually remove this and use the Java JDK provided IntPredicate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is no longer used. Please remove it.

}
return valueComparator.getCompareResult(rowId);
}
};
}

static void compareBoolean(ColumnVector left, ColumnVector right, int[] result) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove all these methods right as they are no longer used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

import io.delta.kernel.defaults.internal.expressions.VectorComparator;

public class StringValueComparator implements ValueComparator {
static final Comparator<String> STRING_COMPARATOR = Comparator.naturalOrder();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move constants like this into DefaultExpressionUtils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Comment on lines 99 to 102
static ColumnVector comparatorVector(
ColumnVector left,
ColumnVector right,
VectorComparator vectorComparator) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static ColumnVector comparatorVector(
ColumnVector left,
ColumnVector right,
VectorComparator vectorComparator) {
static ColumnVector comparatorVector(
ColumnVector left,
ColumnVector right,
IntPredicate vectorComparator) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@vkorukanti vkorukanti modified the milestones: 3.2.0, 3.3.0 May 23, 2024
@zzl-7
Copy link
Contributor Author

zzl-7 commented May 28, 2024

@vkorukanti
Thanks for the code review, I have fixed all suggestions, can you take a look at it again when you are free?
Thank you

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzl-7 Just one last comment. The VectorComparator is no longer used. Can you remove it? Let me know, I can also do it before merging it.

/**
* Comparator interface for lazy evaluating comparator
*/
@Evolving
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is no longer used. Please remove it.

@vkorukanti vkorukanti merged commit 085f117 into delta-io:master May 30, 2024
10 checks passed
@zzl-7
Copy link
Contributor Author

zzl-7 commented May 30, 2024

@zzl-7 Just one last comment. The VectorComparator is no longer used. Can you remove it? Let me know, I can also do it before merging it.

(forgot to comment while waiting for github action)
good catch, just removed it , thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Kernel][Expressions] Make the default expression handler evaluation lazy
2 participants