-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to align with geoarrow spec #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
92be42b updated the example to match the new spec (with my other two comments).
I need to verify two things:
- The WKT CRS is different
GEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],CS[ellipsoidal,2],AXIS["geodetic latitude (Lat)",north,ORDER[1],ANGLEUNIT["degree",0.0174532925199433]],AXIS["geodetic longitude (Lon)",east,ORDER[2],ANGLEUNIT["degree",0.0174532925199433]],USAGE[SCOPE["Horizontal component of 3D system."],AREA["World."],BBOX[-90,-180,90,180]],ID["EPSG",4326]]
- I need to confirm that this is actually written to the parquet metadata, and not just the pyarrow metadata (doesn't have to hold up this PR).
On the bounding boxes, there's a bit of motivation at https://github.com/geopandas/geo-arrow-spec/pull/2/files#diff-6cf20cd11635a4f1fc656313a8d9e2a14f62dc7b68d778deac688b4dd051ffe2R52. @jorisvandenbossche, do you know if dask-geopandas is actually using |
…l/cdw-geo into geoarrow-alignment
I made a mess of the git history, so we might want to squash this before merging, but the CRS example should be fixed now by using "simplified" WKT2. |
Squashing is still beyond my git skills, and I don't really care about the history, so I'm fine to just merge. If you want to squash then go for it, just let me know, as I'd like to merge in the next couple hours.
Oh yeah, I read that, and think it's a cool feature. It just seems like instead of saying ' For example, Parquet supports partitions as individual files' we should actually explain how Parquet can make use of this feature. I'm way out of my depth here, so not sure how it works, but I'd see us putting up a partition example, and explaining how to actually set it up? Or is it all automatic? I assume there needs to be some guidance for how to use bbox's for the file partitions. So I just left it out since I can't explain it. If you want to take a crack at it that'd be awesome. |
Nice! I suppose I'll be able to do that in the future ;) |
Yes, we are using that (it's quite essential for having the efficient spatial partitioning with a partitioned parquet dataset). See here With dask-geopandas, we can sort the data spatially, so each partition of the dask.dataframe is a part of the whole dataset that is spatially somewhat close to each other. When writing that to a partitioned parquet dataset, each partition of the dask.dataframe becomes a file in the parquet dataset, and we set the bbox information of the partition in that file. On reading this dataset again, this allows to only read those files needed when doing a spatial query. |
@@ -5,6 +5,10 @@ | |||
The [Apache Parquet][parquet] provides a standardized open-source columnar storage format. This specification defines how geospatial data | |||
should be stored in parquet format, including the representation of geometries and the required additional metadata. | |||
|
|||
## Version | |||
|
|||
This is version 0.1.0 of the geoparquest specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geoparquest -> geoparquet
schema = ( | ||
table.schema.remove(idx) | ||
.insert(idx, field) | ||
table.schema | ||
.with_metadata({"geoparquet": json.dumps(metadata)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"geoparquet" -> "geo"? (the text below seems to been updated to change this)
|
||
#### crs | ||
|
||
It is strongly recommended to use [EPSG:4326 (lat, long)](https://spatialreference.org/ref/epsg/4326/) for all data, so in most cases the value of the crs should be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's explicitly mentioned here that EPSG:4326 is "lat, long", which is correct. But in practice most GIS systems use "lon, lat" to store geometries, and the same for common file formats like GeoJSON and GeoPackage (AFAIK, see eg opengeospatial/geopackage#540).
So I am assuming that the WKB representation will actually use "lon, lat" as well in this spec. In which case the explicit call out of "lat, long" is a bit confusing. But maybe my assumption is wrong, and this is something we need to discuss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not 100% sure of the default CRS to use. If WKB is long, lat then I agree it makes sense to use crs84. @ghobona - what's the wkt2 for crs84? And is there an entry on spatialreference.org? It'd be nice to link to a web page and not just some big xml document.
@cholmes If you are looking for an HTML rendering of the definition of the EPSG:4326 coordinate reference system, then please use this page website from the official IOGP website. For any questions about the IOGP EPSG registry, please ask @RogerLott |
@ghobona - well what I'm really looking for (I think) is a html page for CRS84. Is there an epsg code / web page for that? |
@cholmes I have not come across a registration of CRS84 on the IOGP website. The OGC Definitions Server does not currently offer HTML rendering of CRS definitions. |
This PR aims to align the geoparquet with the GeoArrow proposal. We hope to fully align, with this spec as the parquet instantiation of all the same geoarrow principles.
For now I tried to align all the names / structure. @TomAugspurger - would be great if you could check if I got it right. They use a JSON description, which is nice, but wouldn't actually be used in parquet right? So I think we should describe it without that, though might be nice to have some sort of 'example' that is just the tables as they'd be in parquet?
I did slim a couple things they have:
I also changed geoparquet_version to be just 'version'. They had a 'schema_version' in the spec, but then just 'version' in the example. I liked just 'version'. Though maybe we should have a specific geoparquet version?
I also left out the bounding box stuff, though it does seem interesting. If we want to explain how it'd work for parquet partitioning use cases then I'd be for having it in.