Readme enhancements #19

cholmes · 2022-03-02T00:06:56Z

Fleshed out more in the readme.

Closes #10, closes #12, closes #15

Some of these sections can break out to their own pages in the future, but we don't have tons of content so just putting it all in here.

jorisvandenbossche

Thanks! Added a bunch of questions/comments

README.md

jorisvandenbossche · 2022-03-02T07:51:30Z

README.md

+There are a few core goals driving the initial development.
+
+* **Enable interoperability among cloud data warehouses** - BigQuery, Snowflake, Redshift and others all support spatial operations but importing and exporting data 
+ with existing formats can be problematic. All support and often recommend Parquet, so defining a solid GeoParquet can help enable interoperability.


I certainly understand the importance of those cloud data warehouses, but personally I think interoperability goes beyond that. For example, we are already using GeoParquet as a fast and interoperable format for Python and R users

Or, put differently, we are maybe missing a goal here. One of the main reasons we started using this in GeoPandas, is to have fast/efficient, columnar file format to store geospatial vector data (in addition to the traditional shapefile / geopackage / geojson). Or maybe that doesn't need to be listed explicitly? (something like "Enable the Parquet file format to store geospatial data" is already covered by the very first paragraph of this README?)

I think it'd be a good goal to lay that out explicitly, like as the first goal, and make the cloud data warehouse point build on it.

Thinking something like:

Build a fast/efficient, columnar file format to store geospatial vector data to enable XXXX

I can use help on what it enables...

Ok, made an attempt at this. Ended up splitting out 'goals' from 'features', and made a stab at explaining it. I've not been deep in the new columnar workflows, so any tweaks to better explain the potential are more than welcome.

Build a fast/efficient, columnar file format to store geospatial vector data to enable XXXX

I can use help on what it enables...

I think in the first place, it enables "fast/efficient data access", I would say
(and then also other features from Parquet, like very good compression (so small file sizes), cheap reading of a subset of columns (the columnar nature), the type system (eg nested types), filtering chunks based on column statistics, ..)

But taking a look at your updates now!

Ah, that's cool. Some geospatial people love really complex data structures, GML went nuts with that stuff. I think we should definitely focus on just the simple features use case and do that really well, but good to know that more complex data structures can be supported.

https://databricks.com/glossary/what-is-parquet is a great overview - I think we should include a link to it somewhere in the readme. It includes a decent overview of columnar formats. I'll add it in.

I think the nested types are mostly relevant when talking about an arrow-native geometry encoding. For example a nested list of values would be able to describe a column of Points, while still having a favorable in-memory encoding (i.e. flat arrays underneath).

Very cool. So we'll leave them off here for now, but maybe mention in the future.

We can indeed use those nested data types to store geometries (the arrow-native encoding proposal), but just a note that there are ecosystems where they make heavy use of the nested schemas that Parquet enables (eg I think Spark supports this quite well; I am not super familiar with it, since nested columns are not really supported in python/pandas). For example, it can map nicely to structured data you might encounter in json files, logs, etc.

(this is just a clarification, agreed it is not that important to have it included in this PR)

README.md

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

README.md

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

@jorisvandenbossche

* Added some 'features' from @jorisvandenbossche suggestions about parquet * Made clear that it's not so good at dealing with lots of transactions on the data * added vis.gl to the list of where people are coming from

README.md

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jorisvandenbossche

Thanks for all the updates, this is looking great!

About parquet and columnar data format advantages

jorisvandenbossche · 2022-03-04T10:04:18Z

I am going to merge this, so we have some more content on the repo landing page. More comments here are of course still welcome, we can always do follow-up PRs.

Adding more to the readme

aa83f28

Some of these sections can break out to their own pages in the future, but we don't have tons of content so just putting it all in here.

jorisvandenbossche reviewed Mar 2, 2022

View reviewed changes

kylebarron reviewed Mar 2, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

cholmes and others added 4 commits March 2, 2022 08:40

Update README.md

519cad4

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Update README.md

9b6ce0b

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Update README.md

9961be8

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Updates based on comments

b48a051

jorisvandenbossche reviewed Mar 2, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

jorisvandenbossche reviewed Mar 2, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

cholmes and others added 3 commits March 2, 2022 14:20

re-ordering bullet points

134a275

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Tweak geometries -> geometry columns

8885c47

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

More tweaks from feedback

cafa091

* Added some 'features' from @jorisvandenbossche suggestions about parquet * Made clear that it's not so good at dealing with lots of transactions on the data * added vis.gl to the list of where people are coming from

jorisvandenbossche reviewed Mar 3, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

cholmes and others added 3 commits March 3, 2022 07:36

grammar fix

9d80ab8

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Update README.md

7df5d12

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

capitalization

a1d4628

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jorisvandenbossche approved these changes Mar 3, 2022

View reviewed changes

Added link for people to learn more

fa6fbb6

About parquet and columnar data format advantages

jorisvandenbossche merged commit d8241a4 into main Mar 4, 2022

jorisvandenbossche deleted the better-readme branch March 4, 2022 10:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme enhancements #19

Readme enhancements #19

cholmes commented Mar 2, 2022 •

edited by jorisvandenbossche

Loading

jorisvandenbossche left a comment

jorisvandenbossche Mar 2, 2022

jorisvandenbossche Mar 2, 2022

cholmes Mar 2, 2022

cholmes Mar 2, 2022

jorisvandenbossche Mar 2, 2022

cholmes Mar 3, 2022

cholmes Mar 3, 2022

kylebarron Mar 3, 2022

cholmes Mar 3, 2022

jorisvandenbossche Mar 3, 2022

jorisvandenbossche left a comment

jorisvandenbossche commented Mar 4, 2022

Readme enhancements #19

Readme enhancements #19

Conversation

cholmes commented Mar 2, 2022 • edited by jorisvandenbossche Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 4, 2022

cholmes commented Mar 2, 2022 •

edited by jorisvandenbossche

Loading