Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st_union - understanding difference with PostGIS #1514

Closed
JimShady opened this issue Jul 8, 2024 · 6 comments
Closed

st_union - understanding difference with PostGIS #1514

JimShady opened this issue Jul 8, 2024 · 6 comments

Comments

@JimShady
Copy link
Contributor

JimShady commented Jul 8, 2024

I am trying to move a process which I currently run in PostGIS, to within the cloud and use Apache Sedona instead. In my PostgreSQL + PostGIS database this runs:

WITH a_table AS (
SELECT st_boundary(geom) geom
FROM   emea_eu_seu.albania_prefectures_2021
UNION ALL
SELECT st_boundary(geom) geom
FROM   emea_eu_seu.albania_municipalities_2021
UNION ALL
SELECT st_boundary(geom) geom
FROM   emea_eu_seu.albania_2_digit_postcode_areas_2021
),
b_table AS (
SELECT   geom
FROM     a_table
GROUP BY geom
)
SELECT   ST_Union(geom) geom
FROM     b_table

The result is one row:

image

I have tried to run the same process using Sedona (code below) but get an error:

WITH a_table AS (
SELECT st_boundary(st_geomfromwkb(geom_wkb)) geom
FROM   hive_metastore.gfk_bronze.albania_prefectures_2021
UNION ALL
SELECT st_boundary(st_geomfromwkb(geom_wkb)) geom
FROM   hive_metastore.gfk_bronze.albania_municipalities_2021
UNION ALL
SELECT st_boundary(st_geomfromwkb(geom_wkb)) geom
FROM   hive_metastore.gfk_bronze.albania_2_digit_postcode_areas_2021
),
b_table AS (
SELECT   geom
FROM     a_table
GROUP BY geom
)
SELECT   ST_Union(geom) geom
FROM     b_table

AnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "st_union(geom)" due to data type mismatch: Parameter 1 requires the "ARRAY<BINARY>" type, however "geom" has the type "BINARY".; line 16 pos 9;
'WithCTE

I am using Sedona 1.6.0 by the way. I read "Variant 2 : As of version 1.6.0, this function accepts an array of Geometry objects and returns the geometric union of all geometries in the input array." so was expecting a similar result to that which PostGIS returns. Am I misunderstanding something?

Thanks!

@furqaankhan
Copy link
Contributor

I think you would want to use ST_Union_Aggr. As the one parameter option of ST_Union expected an Array of geometries. It can be confusing as PostGIS has ST_Union and ST_Union_Aggr as one function.

@JimShady
Copy link
Contributor Author

JimShady commented Jul 9, 2024

Isn't the idea that from 1.6.0 onwards, the behavior is the same?

image

@furqaankhan
Copy link
Contributor

furqaankhan commented Jul 9, 2024

The variant 2 of ST_Union accepts an array of geometries ( [geom1, geom2] ) in each row. What you want to achieve from my understanding is to do ST_Union of all the rows in the specified column, for that we have ST_Union_Aggr. In PostGIS when you are invoking ST_Union after group by, you are using the fourth variant not the third.

@JimShady
Copy link
Contributor Author

JimShady commented Jul 9, 2024

Ok, I'll give that a go and report back. :-)

@JimShady
Copy link
Contributor Author

JimShady commented Jul 9, 2024

Any ideas on how to work around this? The same query finishes in 3 seconds on my fairly modest PostgreSQL server!

%sql

WITH a_table AS (
SELECT st_boundary(st_geomfromwkb(geom_wkb)) geom
FROM   hive_metastore.gfk_bronze.albania_prefectures_2021
UNION ALL
SELECT st_boundary(st_geomfromwkb(geom_wkb)) geom
FROM   hive_metastore.gfk_bronze.albania_municipalities_2021
UNION ALL
SELECT st_boundary(st_geomfromwkb(geom_wkb)) geom
FROM   hive_metastore.gfk_bronze.albania_2_digit_postcode_areas_2021
),
b_table AS (
SELECT   geom
FROM     a_table
GROUP BY geom
)
SELECT   ST_Union_Aggr(geom) geom
FROM     b_table
java.lang.Exception: Results too large
	at com.databricks.backend.daemon.driver.OutputAggregator$.maybeApplyOutputAggregation(OutputAggregator.scala:458)
	at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:279)

@JimShady
Copy link
Contributor Author

This was because the browser could not display the results. Writing to file it was fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants