-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-180] Make ST_* expressions foldable and declares input types for type checking #704
[SEDONA-180] Make ST_* expressions foldable and declares input types for type checking #704
Conversation
e34cb9a
to
067e879
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for creating this PR. This PR is huge and it can benefit lots of users!
I only have a few minor comments. Please address them as you see fit.
// Expression for vectors | ||
ST_PointFromText, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason to change the way we define functions in the Catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for supporting expressions with optional parameters while declaring the types of all parameters in def inputTypes
. Such as ST_MakeValid(geom: Geometry, keepCollapsed: Boolean = false)
, where the second parameter is optional.
Before this patch expressions with optional parameters need to check the number of passed in arguments and resolve the values of optional parameters manually. This patch handles optional parameters in function builder registered to the catalog, so that default argument values will be filled in. Now expressions can declare all its parameters in def inputTypes
, all the arguments will be type-checked, and they don't need to handle default arguments themselves.
*/ | ||
case class ST_Point(inputExpressions: Seq[Expression]) | ||
extends Expression with CodegenFallback with UserDataGeneratator { | ||
inputExpressions.betweenLength(2, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you remove the inputExpression (2, 3) requirement?
I saw similar inputExpression.length assertions were kept in some places and removed in some other places. Is there a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertions for expressions with optional parameters looks strange since all parameters will receive their arguments before evaluation. This assertion became unnecessary since the function builder registered to the catalog will emit error messages for function calls with invalid number of parameters:
SELECT ST_Point(1)
>>> function ST_Point takes at least 2 argument(s), 1 argument(s) specified
For expressions with optional parameters like ST_Point
we cannot simply assert inputExpressions.size == 3
since there's one weird behavior in the implementation of the function builder: when 2 arguments were passed to ST_Point
, it will construct a ST_Point
object using these 2 arguments first, then call its inputTypes
method to acknowledge that it actually has 3 parameters, so that it will construct the final ST_Point
object with 2 user specified arguments and one default argument.
There're better ways to support optional parameters without constructing intermediate invalid expressions, however that may introduce huge changes, so I'd like to live with it for now.
case class ST_Contains(inputExpressions: Seq[Expression]) | ||
extends ST_Predicate with CodegenFallback { | ||
|
||
override def toString: String = s" **${ST_Contains.getClass.getName}** " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the toString() function is only provided in this predicate. Can you put the toString() function to ST_Predicate (e.g., toString(): getClass.getName) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved toString
to ST_Predicate
.
…inputTypes for most of the expressions. Raster functions and viz functions are still left unchanged.
067e879
to
edee2c2
Compare
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
What changes were proposed in this PR?
This patch makes ST_* functions foldable, and declared input types for these functions to make the SQL analyzer perform type checking for these ST_* function calls and emit more friendly error messages. Raster functions and viz functions are still left unchanged.
I've deliberately left
ST_Binary
andST_AsEWKB
unfoldable since unit tests would fail after making these expressions foldable.The reason is that folding binary expressions as constants triggers a function call to
encodeHexString(byte[], boolean)
in spark-sql, while the bundled Apache commons-codec of sernetcdf jar do not have this function defined. We may have to release a new version of sernetcdf with a newer version of / without commons-codec bundled.I've also moved some implementations of ST functions to
sedona-common
, so that they can be reused by other distributed computation engines.I also spotted a bug in
ST_MinimumBoundingCircle
dataframe API, it does not honor the second parameter. This patch also includes a fix for that bug.How was this patch tested?
Hopefully existing unit tests are sufficient to guarantee that I'm not breaking anything.
Did this PR include necessary documentation updates?