Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-314] Support optimized join on ST_HausdorffDistance #878

Merged
merged 53 commits into from
Jun 30, 2023

Conversation

iGN5117
Copy link
Contributor

@iGN5117 iGN5117 commented Jun 28, 2023

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

  • Support optimized join on ST_Hausdorff by adding case matching on all overloaded hausdorff distances

How was this patch tested?

  • Added new unit tests

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation update.

iGN5117 added 30 commits June 6, 2023 13:29
Removed note/tip block elements that caused numbering to be reset
Changed generic Exception to IllegalArgumentException in ST_NumPoints implementation and its corresponding test
# Conflicts:
#	common/src/main/java/org/apache/sedona/common/Functions.java
#	common/src/test/java/org/apache/sedona/common/FunctionsTest.java
#	flink/src/main/java/org/apache/sedona/flink/Catalog.java
#	flink/src/main/java/org/apache/sedona/flink/expressions/Functions.java
#	flink/src/test/java/org/apache/sedona/flink/FunctionTest.java
#	python/sedona/sql/st_functions.py
#	python/tests/sql/test_function.py
#	sql/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala
#	sql/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Functions.scala
#	sql/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/st_functions.scala
#	sql/common/src/test/scala/org/apache/sedona/sql/dataFrameAPITestScala.scala
#	sql/common/src/test/scala/org/apache/sedona/sql/functionTestScala.scala
Refactored function name
Made java tests more comprehensive by checking both nDims and WKT of returned geometry
Added more test cases in scala test cases
Updated documentation with empty geometry case and more examples
…GN5117/sedona into develop_Nilesh_1.4.1_Translate

# Conflicts:
#	common/src/main/java/org/apache/sedona/common/Functions.java
#	common/src/main/java/org/apache/sedona/common/utils/GeomUtils.java
#	common/src/test/java/org/apache/sedona/common/FunctionsTest.java
#	flink/src/main/java/org/apache/sedona/flink/Catalog.java
#	flink/src/main/java/org/apache/sedona/flink/expressions/Functions.java
#	flink/src/test/java/org/apache/sedona/flink/FunctionTest.java
#	python/sedona/sql/st_functions.py
#	python/tests/sql/test_function.py
#	sql/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala
#	sql/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Functions.scala
#	sql/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/st_functions.scala
#	sql/common/src/test/scala/org/apache/sedona/sql/dataFrameAPITestScala.scala
#	sql/common/src/test/scala/org/apache/sedona/sql/functionTestScala.scala
Added geom collection test cases for force3D
This reverts commit 19016ae.
# Conflicts:
#	common/src/test/java/org/apache/sedona/common/FunctionsTest.java
#	flink/src/main/java/org/apache/sedona/flink/Catalog.java
#	flink/src/main/java/org/apache/sedona/flink/expressions/Functions.java
#	python/sedona/sql/st_functions.py
#	python/tests/sql/test_function.py
#	sql/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala
#	sql/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Functions.scala
#	sql/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/st_functions.scala
#	sql/common/src/test/scala/org/apache/sedona/sql/dataFrameAPITestScala.scala
#	sql/common/src/test/scala/org/apache/sedona/sql/functionTestScala.scala
Revert accidental incorrect removals from Catalog while merging with master
# Conflicts:
#	common/src/main/java/org/apache/sedona/common/utils/GeomUtils.java
#	common/src/test/java/org/apache/sedona/common/FunctionsTest.java
#	flink/src/main/java/org/apache/sedona/flink/expressions/Functions.java
#	python/tests/sql/test_dataframe_api.py
#	sql/common/src/test/scala/org/apache/sedona/sql/dataFrameAPITestScala.scala
#	sql/common/src/test/scala/org/apache/sedona/sql/functionTestScala.scala
val expectedDensityIntersects = bruteForceDistanceJoinHausdorff(sampleCount, distance, densityFrac, true)
val distanceDensityIntersectsDF = inputPoint.alias("pointDF").join(inputPolygon.alias("polygonDF"), expr(s"ST_HausdorffDistance(pointDF.pointshape, polygonDF.polygonshape, $densityFrac) <= $distance"))
assert(distanceDensityIntersectsDF.queryExecution.sparkPlan.collect { case p: DistanceJoinExec => p }.size === 1)
assert(distanceDensityIntersectsDF.count() == expectedDensityIntersects)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the number of expected intersects is in these test cases? Can you report the numbers here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, both intersects (<=) and non-intersects (<) return a count of 100 for this dataset. The polygon-point pairs are also exactly same since there is no pair with frechetDistance == 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more distanceCandidates? Most importantly, add some candidates that lead to results > 100.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added candidates 1, 2, 5, 10. These yield result 100, 298, 688, 1258 respectively for a sample size of 100

@jiayuasu jiayuasu merged commit 8b2afd6 into apache:master Jun 30, 2023
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants