Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-180] Make ST_* expressions foldable and declares input types for type checking #704

Merged
merged 1 commit into from
Oct 28, 2022

Conversation

Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Oct 26, 2022

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

This patch makes ST_* functions foldable, and declared input types for these functions to make the SQL analyzer perform type checking for these ST_* function calls and emit more friendly error messages. Raster functions and viz functions are still left unchanged.

I've deliberately left ST_Binary and ST_AsEWKB unfoldable since unit tests would fail after making these expressions foldable.
The reason is that folding binary expressions as constants triggers a function call to encodeHexString(byte[], boolean) in spark-sql, while the bundled Apache commons-codec of sernetcdf jar do not have this function defined. We may have to release a new version of sernetcdf with a newer version of / without commons-codec bundled.

I've also moved some implementations of ST functions to sedona-common, so that they can be reused by other distributed computation engines.

I also spotted a bug in ST_MinimumBoundingCircle dataframe API, it does not honor the second parameter. This patch also includes a fix for that bug.

How was this patch tested?

Hopefully existing unit tests are sufficient to guarantee that I'm not breaking anything.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the docs.

@Kontinuation Kontinuation changed the title WIP: [SEDONA-180] Make ST_* expressions foldable and declares input types for type checking [SEDONA-180] Make ST_* expressions foldable and declares input types for type checking Oct 26, 2022
Copy link
Member

@jiayuasu jiayuasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for creating this PR. This PR is huge and it can benefit lots of users!

I only have a few minor comments. Please address them as you see fit.

// Expression for vectors
ST_PointFromText,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to change the way we define functions in the Catalog?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for supporting expressions with optional parameters while declaring the types of all parameters in def inputTypes. Such as ST_MakeValid(geom: Geometry, keepCollapsed: Boolean = false), where the second parameter is optional.

Before this patch expressions with optional parameters need to check the number of passed in arguments and resolve the values of optional parameters manually. This patch handles optional parameters in function builder registered to the catalog, so that default argument values will be filled in. Now expressions can declare all its parameters in def inputTypes, all the arguments will be type-checked, and they don't need to handle default arguments themselves.

*/
case class ST_Point(inputExpressions: Seq[Expression])
extends Expression with CodegenFallback with UserDataGeneratator {
inputExpressions.betweenLength(2, 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you remove the inputExpression (2, 3) requirement?

I saw similar inputExpression.length assertions were kept in some places and removed in some other places. Is there a reason?

Copy link
Member Author

@Kontinuation Kontinuation Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions for expressions with optional parameters looks strange since all parameters will receive their arguments before evaluation. This assertion became unnecessary since the function builder registered to the catalog will emit error messages for function calls with invalid number of parameters:

SELECT ST_Point(1)
>>> function ST_Point takes at least 2 argument(s), 1 argument(s) specified

For expressions with optional parameters like ST_Point we cannot simply assert inputExpressions.size == 3 since there's one weird behavior in the implementation of the function builder: when 2 arguments were passed to ST_Point, it will construct a ST_Point object using these 2 arguments first, then call its inputTypes method to acknowledge that it actually has 3 parameters, so that it will construct the final ST_Point object with 2 user specified arguments and one default argument.

There're better ways to support optional parameters without constructing intermediate invalid expressions, however that may introduce huge changes, so I'd like to live with it for now.

case class ST_Contains(inputExpressions: Seq[Expression])
extends ST_Predicate with CodegenFallback {

override def toString: String = s" **${ST_Contains.getClass.getName}** "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the toString() function is only provided in this predicate. Can you put the toString() function to ST_Predicate (e.g., toString(): getClass.getName) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved toString to ST_Predicate.

…inputTypes for most of the expressions.

Raster functions and viz functions are still left unchanged.
@jiayuasu jiayuasu merged commit cfaed6a into apache:master Oct 28, 2022
@Kontinuation Kontinuation deleted the better-catalyst-expression branch August 23, 2023 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants