Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add known libraries that support GeoParquet to README #29

Conversation

jorisvandenbossche
Copy link
Collaborator

This are the two packages that I am aware of that already implement this spec.

@jorisvandenbossche
Copy link
Collaborator Author

Do we also want to add known examples of datasets provided in the GeoParquet format?

I think that for example https://planetarycomputer.microsoft.com/dataset/us-census uses Parquet with this metadata (@TomAugspurger?). Although given that this is not easily publicly accessible (?), that's maybe not an ideal example.

@TomAugspurger
Copy link
Collaborator

You can access it anonymously, but you have to jump through a couple hoops to get a URL to download. So probably not the best example.

@cholmes
Copy link
Member

cholmes commented Mar 4, 2022

I agree some data examples would be good. If anyone wants to make one you can use the examples I link to at https://github.com/cholmes/static-ogc-examples/blob/68cdb0428f5ba08aecd99bb181601d9df8cee3ba/linz/buildings/nz-building-collection.json#L53 and convert to geoparquet, and then I can put it in the same Google cloud bucket as the others.

@kylebarron
Copy link
Collaborator

Hopefully soon I'll be able to add JS to the list 🙂 https://github.com/kylebarron/parquet-wasm. It's currently blocked by bugs in the JS Arrow library making it unable to read the IPC Arrow messages decoded from Parquet.

@alasarr
Copy link
Collaborator

alasarr commented Mar 6, 2022

In CARTO we have the data observatory with a huge amount of public datasets available in BigQuery. I think we could convert the US boundaries to GeoParquet and send them to @cholmes to publish. Does it make sense? It's a pretty easy ETL process.

@alasarr
Copy link
Collaborator

alasarr commented Mar 6, 2022

In fact, I can create a small script in a few lines of code that take a Dataset from Data Observatory, convert it to geopandas, and then to GeoParquet. So we'll have more than 10K geospatial datasets directly available in GeoParquet.

Data Observatory is published in BigQuery and we have a project publicly available (carto-do-public-data), so a user can directly use the BigQuery project: carto-do-public-data without requiring a CARTO account.

Totally understand if you don't want to include it because of commercial reasons.

@alasarr
Copy link
Collaborator

alasarr commented Mar 6, 2022

I've done this quick example. In a more generic way, it converts any geospatial data in BigQuery into a GeoParquet file (if the data fits in memory).

I think we can create similar examples for the most used data warehouses (databases): Google BigQuery, Snowflake, Redshift, and PostgreSQL.

How does it sound?

@cholmes
Copy link
Member

cholmes commented Mar 6, 2022

I think we could convert the US boundaries to GeoParquet and send them to @cholmes to publish. Does it make sense?

Sounds great.

Totally understand if you don't want to include it because of commercial reasons.

If that data is all open and no account is required I don't see a reason not to include it.

I've done this quick example. In a more generic way, it converts any geospatial data in BigQuery into a GeoParquet file (if the data fits in memory).

I think we can create similar examples for the most used data warehouses (databases): Google BigQuery, Snowflake, Redshift, and PostgreSQL.

Awesome. I think it makes sense to link to these examples from the readme. At least the BigQuery one for 0.1, and the others if you manage to get them in before the release.

@jorisvandenbossche
Copy link
Collaborator Author

If anyone wants to make one you can use the examples I link to at https://github.com/cholmes/static-ogc-examples/blob/68cdb0428f5ba08aecd99bb181601d9df8cee3ba/linz/buildings/nz-building-collection.json#L53 and convert to geoparquet, and then I can put it in the same Google cloud bucket as the others.

I downloaded the gpkg version of that dataset this morning, and converted it to geoparquet. You can download it from https://www.dropbox.com/s/iuuxuc8z6jw7eu2/nz-buildings-outines.parquet?dl=0 (or create it yourself with the below code)

I created it with an environment created from mamba create -n geoparquet python=3.9 pyarrow=7 geopandas=0.10 pyogrio=0.3, and then ran:

>>> import pyogrio
>>> df = pyogrio.read_dataframe("Downloads/nz-building-outlines.gpkg")
>>> df.to_parquet("nz-buildings-outines.parquet")

That's basically the same as what @alasarr did, but using geopandas' to_parquet instead of manually using geopandas/pyarrow/pyproj.

@cholmes
Copy link
Member

cholmes commented Mar 6, 2022

Awesome, it's up at https://storage.googleapis.com/open-geodata/linz-examples/nz-buildings-outines.parquet

Feel free to add text to include it as an example, or I should be able to get to that tomorrow.

@kylebarron
Copy link
Collaborator

Not sure if it matters but there's a typo in the filename: outines instead of outlines.

@cholmes
Copy link
Member

cholmes commented Mar 6, 2022

@alasarr
Copy link
Collaborator

alasarr commented Mar 6, 2022

That's basically the same as what @alasarr did, but using geopandas' to_parquet instead of manually using geopandas/pyarrow/pyproj.

Didn't know about this method. Thanks for sharing @jorisvandenbossche. When do you expect to upgrade that in geopandas to support geoparquet 0.1 😄

@jorisvandenbossche
Copy link
Collaborator Author

When do you expect to upgrade that in geopandas to support geoparquet 0.1 😄

It already does! (that's the reason I am listing it in this PR :))

To be strictly correct, it doesn't support for the full 0.1 spec when writing, eg it doesn't include the "edges" field, but since that is an optional field, it's still fully 0.1 compliant.

@jorisvandenbossche
Copy link
Collaborator Author

Will merge this, will do a follow-up listing some example files.

@jorisvandenbossche
Copy link
Collaborator Author

Ah, @cholmes already added that here :)

@jorisvandenbossche jorisvandenbossche merged commit 1a652f2 into opengeospatial:main Mar 8, 2022
@jorisvandenbossche jorisvandenbossche deleted the add-existing-implementations branch March 8, 2022 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants