-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advertizing geometry field "type" in Column metadata ? #41
Comments
Seems like a good idea to me. I could even see this being a required field - ensure people at least try to fill it out instead of giving an option to be lazy and leave it blank even though clients would like to know that the data is all one type. So I lean towards options two or three. And option 3 does seem like it'd be useful for the shapefile use case. Can you explain how GeometryCollection is different than 'mixed'? I have a vague idea but would be interested to hear how it works. |
'mixed' would mean you have for example record 0 is a Polygon and record 1 is a MultiPolygon |
+1 on adding such an optional field. We briefly discussed a potential optional |
Following the "type" values from GeoJSON sounds good.
For clarification: would it be strictly required to use the Z suffix if you have 3D data? (for example, specifying |
I'd say yes. For example PostGIS implements very strict dimensionality checks, and if using ogr2ogr to ingest a Parquet file into PostGIS, we could get into issues if the Parquet geometry type is not correctly declared. example:
yes |
For information, the initial OGR Parquet driver writes a provisional "gdal:geometry_type" property in the "geo" metadata for now (on reading it will try both "geometry_type" and "gdal:geometry_type"). For now, it only writes for the 2D case: "Point", "LineString", "Polygon", "MultiPoint", "MultiLineString", "MultiPolygon", "GeometryCollection" or "mixed" (the later matches OGR wkbUnknown status). For the Z / M cases, do we want to append the dimensionality just after, or with a separator space: ie "PointZ" or "Point Z" ? |
Do we want to distinguish "mixed" from "unknown"? (or not set)
I would maybe go with a space. Or are you aware of any projects that we could follow here? |
The wkbUnknown state is typically used by GDAL to report geometry columns that may contain mixed geometry types, e.g a GEOMETRY PostGIS column. The OGR PostGIS driver will not try to check if there's a mix of geometry types in the table content or if it is uniform, because that could be too lengthy.
would be fine for me. With a space, this is consistent with the beginning of a ISO WKT (if ignoring case), e.g "POINT Z (1 2 3)"
No, can't think of other references.
This is a human facing representation (and that predates ISO WKT/WKB support). I wouldn't use that for JSON encoding. |
Sounds good. It's still an optional field, so you also have the option of not including it to signal that it is unknown .. ? |
yes, probably for a file that declares a version of the spec that supports geometry_type, the absence of geometry_type should be understood as the same as a "geometry_type": "unknown", and would not trigger file content analysis. |
'unknown' makes sense to me. I do think it'd be nice to have some nudge on implementors to at least try to add it if possible, since it is clearly useful for a set of tools. Perhaps make it required, but allow 'unknown', so that implementors need to at least think about if they can provide something useful. Versus thinking 'oh, it's an optional field, I don't need to worry about it'. Looks like Flatgeobuf uses 'Unknown' for similar use case: https://github.com/flatgeobuf/flatgeobuf/blob/6884cb05438686947f7bf3776c7ae5c9febfbf5a/src/fbs/header.fbs#L70 And I think with their header stuff the field is required? |
I just opened a draft PR adding some initial text: #51
It's indeed maybe good to make a choice between either keeping it optional (and leaving out "Unknown") or making it required but allow "Unknown". Because otherwise we just give people two ways to say that it is not known. (on the other hand, making it optional is more backwards compatible, so even if we make it "required" I would hope that implementations are a bit flexible) |
Ah right, yeah - two ways to say it's not known seems less good. I continue to lean towards explicitly saying Unknown. And if implementations choose to support 0.1 then they obviously need to have that flexibility. Right now it's important that implementations have that flexibility, but I'm hoping that when we go 1.0 we'll have a lot more implementations, and that for those it'll be reasonable to just say 'I only support version 1.0' (while there will also be lots of great libraries that support all the early versions). Note I will shift at some point relatively soon to caring a lot about backward compatibility and do appreciate your push on it @jorisvandenbossche. But I feel in the early 0.1 / 0.2 / 0.3 times we should give ourselves some flexibility to change as we discuss and get wider feedback. |
IMO, if the field is optional there is no need for the "Unknown" geometry type. Just not including the field should be enough. Otherwise, we need to explain what "Unknown" means, and the users of the spec should parse the "Unknown" type. |
Yes, @Jesus89, I think we're all on the same page. @jorisvandenbossche comment:
And @cholmes commit suggestion: #51 (review) |
A common use case if for a geometry column to hold a single geometry type (Point, LineString, Polygon, ...) for all its records. It could be good to have an optional "type" field name under https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#column-metadata to capture that when it is known.
This would help for example conversion between GeoParquet and GIS formats (shapefiles, geopackage) that have typically this information in their metadata.
Values for type could be the ones accepted by GeoJSON: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection (that would be extended to CircularString, CompoundCurve, CurvePolygon, MultiCurve, MultiSurface, PolyhedralSurface, TIN if we support ISO WKB, and with Z, M or ZM suffixes for other dimensionalities)
What to do when there is mixed geometry types ?
The text was updated successfully, but these errors were encountered: