try 1-N-N performance tuning with LATERAL subquery #1280

MyqueWooMiddo · 2024-03-21T02:26:23Z

Expected behavior

reference to https://postgis.net/workshops/postgis-intro/knn.html

https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-lateral-subquery.html

I upgrade spark to 3.5.1 , try LATERAL to calculate 1-N-N (1-Nearest-Neighbour)

I'll get point's 1-N-N inside the same table : data_points(id,longitude,latitude) ,use sedona

Actual behavior

spark do not support this type LATERAL

Steps to reproduce the problem

with t_data as (
select id ,st_point(longitude,latitude) as point from data_points order by 1 limit 1000
)
select * from t_data t1, lateral (
select t2.id,ST_DistanceSpheroid(t1.point,t2.point) as distance from t_data t2
where t1.id!=t2.id order by 2 limit 1
)

Spark throws :
"org.apache.spark.sql.catalyst.ExtendedAnalysisException: [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.ACCESSING_OUTER_QUERY_COLUMN_IS_NOT_ALLOWED] Unsupported subquery expression: Accessing outer query column is not allowed in this locationProject"

I just want to know How can optimize 1-N-N in a large dataset rather than row_number(order by distance) = 1

Settings

Sedona version = 1.5.1

Apache Spark version = 3.5.1

API type = Scala

Scala version = 2.12

JRE version = 1.8

Environment = Standalone

jiayuasu · 2024-03-24T07:49:23Z

All NN join or KNN join is not currently supported in Apache Sedona. We will add the support in one or two months.

MyqueWooMiddo · 2024-03-28T11:49:52Z

All NN join or KNN join is not currently supported in Apache Sedona. We will add the support in one or two months.

I think solution with iteral H3 of databricks Mosaic is a good idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

try 1-N-N performance tuning with LATERAL subquery #1280

try 1-N-N performance tuning with LATERAL subquery #1280

MyqueWooMiddo commented Mar 21, 2024

jiayuasu commented Mar 24, 2024

MyqueWooMiddo commented Mar 28, 2024

try 1-N-N performance tuning with LATERAL subquery #1280

try 1-N-N performance tuning with LATERAL subquery #1280

Comments

MyqueWooMiddo commented Mar 21, 2024

Expected behavior

Actual behavior

Steps to reproduce the problem

Settings

jiayuasu commented Mar 24, 2024

MyqueWooMiddo commented Mar 28, 2024