Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-455] geoparquet.metadata data source for inspecting GeoParquet metadata #1180

Merged
merged 8 commits into from
Jan 5, 2024

Conversation

Kontinuation
Copy link
Member

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

This patch adds a new data source named geoparquet.metadata implemented using DataSourceV2 API. It produces a dataframe containing GeoParquet metadata for each data file. Here is an example resulting dataframe loaded from geoparquet.metadata:

+-----------------+------------+--------------+------------------------------------------------------------------+
|path             |version     |primary_column|columns                                                           |
+-----------------+------------+--------------+------------------------------------------------------------------+
|datafile1.parquet|1.0.0-beta.1|geom          |{geom -> {WKB, [Polygon], [1000.0, 1000.0, 2000.0, 2000.0], NULL}}|
|datafile2.parquet|1.0.0-beta.1|geom          |{geom -> {WKB, [Polygon], [5000.0, 5000.0, 6000.0, 6000.0], NULL}}|
|datafile3.parquet|1.0.0-beta.1|geom          |{geom -> {WKB, [Polygon], [0.0, 0.0, 1000.0, 1000.0], NULL}}      |
+-----------------+------------+--------------+------------------------------------------------------------------+

How was this patch tested?

Add new tests for the newly added data source.

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation update.

@jiayuasu
Copy link
Member

jiayuasu commented Jan 4, 2024

What will happen if the geoparquet file has more than 1 geometry column?

@Kontinuation
Copy link
Member Author

What will happen if the geoparquet file has more than 1 geometry column?

The columns field is a map from column name to column metadata. If geoparquet file has more than 1 geometry column, the value of column field will have multiple keys and values.

@Kontinuation Kontinuation marked this pull request as ready for review January 5, 2024 02:14
@jiayuasu
Copy link
Member

jiayuasu commented Jan 5, 2024

@Kontinuation does this work if the CRS is a projjson string?

@Kontinuation
Copy link
Member Author

Kontinuation commented Jan 5, 2024

@Kontinuation does this work if the CRS is a projjson string?

Sure. This is an example of the columns field read from geoparquet files with CRS metadata:

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|columns                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{geometry -> {WKB, [], [913175.109, 120121.8813, 1067382.5084, 272844.2936], "PROJCRS[\"NAD83 [/](https://file+.vscode-resource.vscode-cdn.net/) New York Long Island (ftUS)\",BASEGEOGCRS[\"NAD83\",DATUM[\"North American Datum 1983\",ELLIPSOID[\"GRS 1980\",6378137,298.257222101]],UNIT[\"degree\",0.0174532925199433]],CONVERSION[\"SPCS83 New York Long Island zone (US Survey feet)\",METHOD[\"Lambert Conic Conformal (2SP)\"],PARAMETER[\"Latitude of false origin\",40.1666666666667],PARAMETER[\"Longitude of false origin\",-74],PARAMETER[\"Latitude of 1st standard parallel\",41.0333333333333],PARAMETER[\"Latitude of 2nd standard parallel\",40.6666666666667],PARAMETER[\"Easting at false origin\",984250],PARAMETER[\"Northing at false origin\",0]],CS[Cartesian,2],AXIS[\"easting (X)\",east],AXIS[\"northing (Y)\",north],UNIT[\"US survey foot\",0.304800609601219],USAGE[SCOPE[\"Engineering survey, topographic mapping.\"],AREA[\"United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk.\"],BBOX[40.47,-74.26,41.3,-71.8]],ID[\"EPSG\",2263]]"}}                                                                                                                                                                                                                                                                                                                                                               |
|{geometry -> {WKB, [Polygon, MultiPolygon], [-180.0, -90.0, 180.0, 83.6451], {"$schema":"https://proj.org/schemas/v0.5/projjson.schema.json","type":"GeographicCRS","name":"WGS 84 (CRS84)","datumEnsemble":{"name":"World Geodetic System 1984 ensemble","members":[{"name":"World Geodetic System 1984 (Transit)","id":{"authority":"EPSG","code":1166}},{"name":"World Geodetic System 1984 (G730)","id":{"authority":"EPSG","code":1152}},{"name":"World Geodetic System 1984 (G873)","id":{"authority":"EPSG","code":1153}},{"name":"World Geodetic System 1984 (G1150)","id":{"authority":"EPSG","code":1154}},{"name":"World Geodetic System 1984 (G1674)","id":{"authority":"EPSG","code":1155}},{"name":"World Geodetic System 1984 (G1762)","id":{"authority":"EPSG","code":1156}},{"name":"World Geodetic System 1984 (G2139)","id":{"authority":"EPSG","code":1309}}],"ellipsoid":{"name":"WGS 84","semiMajorAxis":6378137,"inverseFlattening":298.257223563},"accuracy":"2.0","id":{"authority":"EPSG","code":6326}},"coordinateSystem":{"subtype":"ellipsoidal","axis":[{"name":"Geodetic longitude","abbreviation":"Lon","direction":"east","unit":"degree"},{"name":"Geodetic latitude","abbreviation":"Lat","direction":"north","unit":"degree"}]},"scope":"Not known.","area":"World.","bbox":{"southLatitude":-90,"westLongitude":-180,"northLatitude":90,"eastLongitude":180},"id":{"authority":"OGC","code":"CRS84"}}}}|
|{geometry -> {WKB, [], [-180.0, -90.0, 180.0, 83.6451], {"$schema":"https://proj.org/schemas/v0.4/projjson.schema.json","type":"GeographicCRS","name":"WGS 84 (CRS84)","datumEnsemble":{"name":"World Geodetic System 1984 ensemble","members":[{"name":"World Geodetic System 1984 (Transit)","id":{"authority":"EPSG","code":1166}},{"name":"World Geodetic System 1984 (G730)","id":{"authority":"EPSG","code":1152}},{"name":"World Geodetic System 1984 (G873)","id":{"authority":"EPSG","code":1153}},{"name":"World Geodetic System 1984 (G1150)","id":{"authority":"EPSG","code":1154}},{"name":"World Geodetic System 1984 (G1674)","id":{"authority":"EPSG","code":1155}},{"name":"World Geodetic System 1984 (G1762)","id":{"authority":"EPSG","code":1156}},{"name":"World Geodetic System 1984 (G2139)","id":{"authority":"EPSG","code":1309}}],"ellipsoid":{"name":"WGS 84","semiMajorAxis":6378137,"inverseFlattening":298.257223563},"accuracy":"2.0","id":{"authority":"EPSG","code":6326}},"coordinateSystem":{"subtype":"ellipsoidal","axis":[{"name":"Geodetic longitude","abbreviation":"Lon","direction":"east","unit":"degree"},{"name":"Geodetic latitude","abbreviation":"Lat","direction":"north","unit":"degree"}]},"scope":"Not known.","area":"World.","bbox":{"southLatitude":-90,"westLongitude":-180,"northLatitude":90,"eastLongitude":180},"id":{"authority":"OGC","code":"CRS84"}}}}                     |
|{geometry -> {WKB, [], [-175.2206, -41.3, 179.2166, 64.15], "GEOGCRS[\"WGS 84\",ENSEMBLE[\"World Geodetic System 1984 ensemble\",MEMBER[\"World Geodetic System 1984 (Transit)\"],MEMBER[\"World Geodetic System 1984 (G730)\"],MEMBER[\"World Geodetic System 1984 (G873)\"],MEMBER[\"World Geodetic System 1984 (G1150)\"],MEMBER[\"World Geodetic System 1984 (G1674)\"],MEMBER[\"World Geodetic System 1984 (G1762)\"],MEMBER[\"World Geodetic System 1984 (G2139)\"],ELLIPSOID[\"WGS 84\",6378137,298.257223563],ENSEMBLEACCURACY[2.0]],CS[ellipsoidal,2],AXIS[\"geodetic latitude (Lat)\",north],AXIS[\"geodetic longitude (Lon)\",east],UNIT[\"degree\",0.0174532925199433],USAGE[SCOPE[\"Horizontal component of 3D system.\"],AREA[\"World.\"],BBOX[-90,-180,90,180]],ID[\"EPSG\",4326]]"}}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The CRS may not be a PROJJSON object in old versions of example geoparquet files. They can still be properly loaded by geoparquet.metadata data source.

@jiayuasu jiayuasu merged commit 1e64741 into apache:master Jan 5, 2024
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants