Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update validator with the latest spec changes #70

Merged
merged 8 commits into from
Apr 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/workflows/scripts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,28 @@ on:
pull_request:

jobs:
validate-examples:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
python-version: 3.8

- name: Install validator
run: |
cd validator/python
python -m pip install --no-binary geoparquet_validator .

- name: Run validator
run: |
for example in $(ls examples/*.parquet); do
echo $example;
geoparquet_validator $example || exit 1;
Jesus89 marked this conversation as resolved.
Show resolved Hide resolved
done

test-json-metadata:
runs-on: ubuntu-latest
steps:
Expand Down
Binary file modified examples/example.parquet
Binary file not shown.
32 changes: 3 additions & 29 deletions examples/example.py
Original file line number Diff line number Diff line change
@@ -1,32 +1,5 @@
"""
Generates `example.parquet` using pyarrow by running `python example.py`.

You can print the metadata with:

.. code-block:: python

>>> import json, pprint, pyarrow.parquet as pq
>>> pprint.pprint(json.loads(pq.read_schema("example.parquet").metadata[b"geo"]))
{'columns': {'geometry': {'bbox': [-180.0, -90.0, 180.0, 83.6451],
'crs': 'GEOGCRS["WGS 84 (CRS84)",ENSEMBLE["World '
'Geodetic System 1984 ensemble",MEMBER["World '
'Geodetic System 1984 '
'(Transit)"],MEMBER["World Geodetic System '
'1984 (G730)"],MEMBER["World Geodetic System '
'1984 (G873)"],MEMBER["World Geodetic System '
'1984 (G1150)"],MEMBER["World Geodetic System '
'1984 (G1674)"],MEMBER["World Geodetic System '
'1984 (G1762)"],MEMBER["World Geodetic System '
'1984 (G2139)"],ELLIPSOID["WGS '
'84",6378137,298.257223563],ENSEMBLEACCURACY[2.0]],CS[ellipsoidal,2],AXIS["geodetic '
'longitude (Lon)",east],AXIS["geodetic '
'latitude '
'(Lat)",north],UNIT["degree",0.0174532925199433],USAGE[SCOPE["Not '
'known."],AREA["World."],BBOX[-90,-180,90,180]],ID["OGC","CRS84"]]',
'edges': 'planar',
'encoding': 'WKB'}},
'primary_column': 'geometry',
'version': '0.1.0'}
"""
import json
import pathlib
Expand All @@ -39,7 +12,7 @@
HERE = pathlib.Path(__file__).parent
Jesus89 marked this conversation as resolved.
Show resolved Hide resolved

df = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))
df = df.to_crs('crs84')
df = df.to_crs("crs84")
table = pa.Table.from_pandas(df.head().to_wkb())


Expand All @@ -48,8 +21,9 @@
"primary_column": "geometry",
"columns": {
"geometry": {
"crs": df.crs.to_wkt(pyproj.enums.WktVersion.WKT2_2019_SIMPLIFIED),
"encoding": "WKB",
"geometry_type": ["Polygon", "MultiPolygon"],
"crs": df.crs.to_wkt(pyproj.enums.WktVersion.WKT2_2019_SIMPLIFIED),
"edges": "planar",
"bbox": [round(x, 4) for x in df.geometry.unary_union.bounds],
},
Expand Down
8 changes: 6 additions & 2 deletions examples/example_metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,13 @@
180.0,
83.6451
],
"crs": "GEOGCRS[\"WGS 84\",ENSEMBLE[\"World Geodetic System 1984 ensemble\",MEMBER[\"World Geodetic System 1984 (Transit)\"],MEMBER[\"World Geodetic System 1984 (G730)\"],MEMBER[\"World Geodetic System 1984 (G873)\"],MEMBER[\"World Geodetic System 1984 (G1150)\"],MEMBER[\"World Geodetic System 1984 (G1674)\"],MEMBER[\"World Geodetic System 1984 (G1762)\"],MEMBER[\"World Geodetic System 1984 (G2139)\"],ELLIPSOID[\"WGS 84\",6378137,298.257223563],ENSEMBLEACCURACY[2.0]],CS[ellipsoidal,2],AXIS[\"geodetic latitude (Lat)\",north],AXIS[\"geodetic longitude (Lon)\",east],UNIT[\"degree\",0.0174532925199433],USAGE[SCOPE[\"Horizontal component of 3D system.\"],AREA[\"World.\"],BBOX[-90,-180,90,180]],ID[\"EPSG\",4326]]",
"crs": "GEOGCRS[\"WGS 84 (CRS84)\",ENSEMBLE[\"World Geodetic System 1984 ensemble\",MEMBER[\"World Geodetic System 1984 (Transit)\"],MEMBER[\"World Geodetic System 1984 (G730)\"],MEMBER[\"World Geodetic System 1984 (G873)\"],MEMBER[\"World Geodetic System 1984 (G1150)\"],MEMBER[\"World Geodetic System 1984 (G1674)\"],MEMBER[\"World Geodetic System 1984 (G1762)\"],MEMBER[\"World Geodetic System 1984 (G2139)\"],ELLIPSOID[\"WGS 84\",6378137,298.257223563],ENSEMBLEACCURACY[2.0]],CS[ellipsoidal,2],AXIS[\"geodetic longitude (Lon)\",east],AXIS[\"geodetic latitude (Lat)\",north],UNIT[\"degree\",0.0174532925199433],USAGE[SCOPE[\"Not known.\"],AREA[\"World.\"],BBOX[-90,-180,90,180]],ID[\"OGC\",\"CRS84\"]]",
"edges": "planar",
"encoding": "WKB"
"encoding": "WKB",
"geometry_type": [
"Polygon",
"MultiPolygon"
]
}
},
"primary_column": "geometry",
Expand Down
12 changes: 6 additions & 6 deletions format-specs/geoparquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Each geometry column in the dataset must be included in the columns field above
| Field Name | Type | Description |
| ---------- | ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| encoding | string | **REQUIRED** Name of the geometry encoding format. Currently only 'WKB' is supported. |
| geometry_type | string or \[string] | **REQUIRED** The geometry type(s) of all geometries, or "Unknown" if they are not known. |
| geometry_type | string or \[string] | **REQUIRED** The geometry type(s) of all geometries, or 'Unknown' if they are not known. |
| crs | string | **OPTIONAL** [WKT2](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html) string representing the Coordinate Reference System (CRS) of the geometry. If the crs field is not included then the data in this column must be stored in longitude, latitude. In the case where a crs is not provided, CRS-aware implementations should assume a default value of [OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) (longitude-latitude coordinates) |
| edges | string | **OPTIONAL** Name of the coordinate system for the edges. Must be one of 'planar' or 'spherical'. The default value is 'planar'. |
| bbox | \[number] | **OPTIONAL** Bounding Box of the geometries in the file, formatted according to [RFC 7946, section 5](https://tools.ietf.org/html/rfc7946#section-5) |
Expand All @@ -62,7 +62,7 @@ Each geometry column in the dataset must be included in the columns field above

#### crs

The Coordinate Reference System (CRS) is an optional parameter for each geometry column defined in geoparquet format.
The Coordinate Reference System (CRS) is an optional parameter for each geometry column defined in geoparquet format.
Jesus89 marked this conversation as resolved.
Show resolved Hide resolved

The CRS must be provided in [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) version 2, also known as **WKT2**. WKT2 has several revisions, this specification only supports [WKT2_2019](https://docs.opengeospatial.org/is/18-010r7/18-010r7.html).

Expand Down Expand Up @@ -98,7 +98,7 @@ GEOGCRS["WGS 84 (CRS84)",
ID["OGC","CRS84"]]
```

Due to the large number of CRSes available and the difficulty of implementing all of them, we expect that a number of implementations will start without support for the optional `crs` field.
Due to the large number of CRSes available and the difficulty of implementing all of them, we expect that a number of implementations will start without support for the optional `crs` field.
Users are recommended to store their data in longitude, latitude (OGC:CRS84 or not including the `crs` field) for it to work with the widest number of tools. But data that is better served in particular projections can choose to use an alternate coordinate reference system. We expect many tools will support alternate CRSes, but encourage users to check to ensure their chosen tool supports their chosen crs.

#### epoch
Expand All @@ -122,9 +122,9 @@ Note that the current version of the spec only allows for a subset of WKB: 2D or

#### Coordinate axis order

The axis order of the coordinates in WKB stored in a geoparquet follows the de facto standard for axis order in WKB and is therefore always
(x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS.
This follows the precedent of [GeoPackage](https://geopackage.org), see the [note in their spec](https://www.geopackage.org/spec130/#gpb_spec).
The axis order of the coordinates in WKB stored in a geoparquet follows the de facto standard for axis order in WKB and is therefore always
(x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS.
This follows the precedent of [GeoPackage](https://geopackage.org), see the [note in their spec](https://www.geopackage.org/spec130/#gpb_spec).

#### geometry_type

Expand Down
39 changes: 32 additions & 7 deletions validator/python/geoparquet_validator/schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"type": "object",
"description": "Parquet metadata included in the geo field.",
"properties": {
"schema_version": {
"version": {
"type": "string",
"pattern": "^0\\.1\\.[0-9]+$",
"description": "The version of the geoparquet metadata standard used when writing."
Expand All @@ -20,15 +20,30 @@
".*": {
"type": "object",
"properties": {
"crs": {
"type": "string",
"description": "WKT2 representing the Coordinate Reference System (CRS) of the geometry."
},
"encoding": {
"type": "string",
"enum": ["WKB"],
"description": "Name of the geometry encoding format. Currently only 'WKB' is supported."
},
"geometry_type": {
"oneOf": [
{
"$ref": "#/$defs/geometry_type"
},
{
"type": "array",
"items": {
"$ref": "#/$defs/geometry_type"
},
"uniqueItems": true
}
],
"description": "The geometry type(s) of all geometries, or 'Unknown' if they are not known."
},
"crs": {
"type": "string",
"description": "WKT2 representing the Coordinate Reference System (CRS) of the geometry."
},
"edges": {
"type": "string",
"enum": ["planar", "spherical"],
Expand All @@ -55,14 +70,24 @@
"description": "The maximum constant latitude line that bounds the rectangle (ymax)."
}
]
},
"epoch": {
"type": "number",
"description": "Coordinate epoch in case of a dynamic CRS, expressed as a decimal year."
}
},
"additionalProperties": true,
"required": ["crs", "encoding"]
"required": ["encoding", "geometry_type"]
}
}
}
},
"additionalProperties": true,
"required": ["schema_version", "primary_column", "columns"]
"required": ["version", "primary_column", "columns"],
"$defs": {
"geometry_type": {
"type": "string",
"enum": ["Point", "LineString", "Polygon", "MultiPoint", "MultiLineString", "MultiPolygon", "GeometryCollection", "Unknown"]
}
}
}