-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add known libraries that support GeoParquet to README #29
Add known libraries that support GeoParquet to README #29
Conversation
Do we also want to add known examples of datasets provided in the GeoParquet format? I think that for example https://planetarycomputer.microsoft.com/dataset/us-census uses Parquet with this metadata (@TomAugspurger?). Although given that this is not easily publicly accessible (?), that's maybe not an ideal example. |
You can access it anonymously, but you have to jump through a couple hoops to get a URL to download. So probably not the best example. |
I agree some data examples would be good. If anyone wants to make one you can use the examples I link to at https://github.com/cholmes/static-ogc-examples/blob/68cdb0428f5ba08aecd99bb181601d9df8cee3ba/linz/buildings/nz-building-collection.json#L53 and convert to geoparquet, and then I can put it in the same Google cloud bucket as the others. |
Hopefully soon I'll be able to add JS to the list 🙂 https://github.com/kylebarron/parquet-wasm. It's currently blocked by bugs in the JS Arrow library making it unable to read the IPC Arrow messages decoded from Parquet. |
In CARTO we have the data observatory with a huge amount of public datasets available in BigQuery. I think we could convert the US boundaries to GeoParquet and send them to @cholmes to publish. Does it make sense? It's a pretty easy ETL process. |
In fact, I can create a small script in a few lines of code that take a Dataset from Data Observatory, convert it to geopandas, and then to GeoParquet. So we'll have more than 10K geospatial datasets directly available in GeoParquet. Data Observatory is published in BigQuery and we have a project publicly available ( Totally understand if you don't want to include it because of commercial reasons. |
I've done this quick example. In a more generic way, it converts any geospatial data in BigQuery into a GeoParquet file (if the data fits in memory). I think we can create similar examples for the most used data warehouses (databases): Google BigQuery, Snowflake, Redshift, and PostgreSQL. How does it sound? |
Sounds great.
If that data is all open and no account is required I don't see a reason not to include it.
Awesome. I think it makes sense to link to these examples from the readme. At least the BigQuery one for 0.1, and the others if you manage to get them in before the release. |
I downloaded the gpkg version of that dataset this morning, and converted it to geoparquet. You can download it from https://www.dropbox.com/s/iuuxuc8z6jw7eu2/nz-buildings-outines.parquet?dl=0 (or create it yourself with the below code) I created it with an environment created from
That's basically the same as what @alasarr did, but using geopandas' |
Awesome, it's up at https://storage.googleapis.com/open-geodata/linz-examples/nz-buildings-outines.parquet Feel free to add text to include it as an example, or I should be able to get to that tomorrow. |
Not sure if it matters but there's a typo in the filename: |
Didn't know about this method. Thanks for sharing @jorisvandenbossche. When do you expect to upgrade that in geopandas to support geoparquet 0.1 😄 |
It already does! (that's the reason I am listing it in this PR :)) To be strictly correct, it doesn't support for the full 0.1 spec when writing, eg it doesn't include the "edges" field, but since that is an optional field, it's still fully 0.1 compliant. |
Will merge this, will do a follow-up listing some example files. |
Ah, @cholmes already added that here :) |
This are the two packages that I am aware of that already implement this spec.