-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define polygon orientation rules #46
Comments
Very good call, thanks @mentin! If you'd like to contribute directly to the spec feel free to open a pull request. If not I'm sure someone will be able to do it - hoping we can get it in before 0.2. It'll be good to have this clear. |
My preference would be that this should only be a requirement for the spherical edges case. For planar, it imposes a constraint on all writers to potentially fix winding order, that isn't required by the underlying standard (OGC Simple Feature Access). |
Nice catch @mentin! I can help to open a PR to include this, just let me know. According to #2. As @rouault mentions, the current format we use for encoding geometries (WKB) doesn't impose that constraint, so my preference would be to not include it unless we have a specific reason like #1. Not a strong opinion here, just want to keep as simpler as possible at the beginning to increase adoption. Does it make sense for you @mentin? |
Yes, it makes sense to only require it for spherical case. Our (BigQuery) experience with GeoJson was that despite the RFC requiring specific orientation, we frequently see opposite orientation too, so we ignore it when reading GeoJson files. |
I think it is instructive to observe what happened with GeoJSON here. The winding order was omitted from the original spec and then later added. As a result, spec-compliant GeoJSON does not require a winding check and potential re-winding, but in practice clients need to deal with old GeoJSON out there so all perform the winding check. My perspective from a frontend visualization point of view is that this is unfortunate, and means clients cannot simply copy across the vertex data to the GPU, but must instead perform a pass through the data. My preference would be to enforce a winding order in all cases. I recognize that this moves the burden from the readers to the writers, but I think it makes for a better user experience and in fact makes for a simpler spec as we don't need distinguish the spherical/planar cases. I'm still open to discussion on this, but to get the ball rolling I've created a PR. |
I certainly understand the importance of a proper orientation (even for planar data) for certain use cases such as visualization (eg spatialpandas also has an Another option could also be to add a new field to the metadata that indicates if the data is "known to have correct winding order" (eg Of course, having it as a field that can be set to false, creates the risk that many people generate datasets that don't have a correct winding order and that become less efficient to read (in case a specific winding order is required for your application). @rouault you mention that the OGC Simple Feature Access standard doesn't require a specific winding order, but from https://www.ogc.org/standards/sfa (https://portal.ogc.org/files/?artifact_id=25355), I see:
|
Interesting... So this mentions exists in SFA 1.2.0 and 1.2.1, but the earlier SFA 1.1.0 (https://portal.ogc.org/files/?artifact_id=13227) from 2005 doesn't mention any winding order. |
Hmm, then the text is not very clear in how it should be interpreted .. (eg in this SO answer it is being interpreted as counter-clockwise for exterior rings: https://gis.stackexchange.com/a/119156) |
Very interesting point @jorisvandenbossche. It definitely makes things harder for writers.
For the use case of GeoParquet, I still feel it's better to have a solid format that removes ambiguity. My feeling is that it will be harder at the beginning, but easier in the long term. I know I'm prioritizing readers over writers, but just because I think it might be better for interoperability. |
Closing this as we've merged @felixpalmer suggestion. Feel free to re-open if you've further suggestions |
I know the PR has been merged, but to repeat my proposal from above: how would people feel about an explicit field for this that can be set (eg Speaking for myself as geopandas developer, I assume we would offer people the option to not correct geometries before writing a parquet file (because it is a costly operation), which with the current wording in the merged PR, that would mean that we would enable users to write invalid files. (now, I would have to do some experimentation myself to assess how costly this is (and to have an estimate how important this option would be, it's also only for writing while read performance is probably more sensitive). But generally in geopandas, because being based on GEOS and any GEOS operation returning "wrong" coordinate orientation, we would almost always have to correct coordinate orientation before writing) |
Maybe we can revisit your proposal @jorisvandenbossche for version 0.3 after your experimentation, and we can move forward with the 0.2. |
I feel that @jorisvandenbossche and @rouault are both worried about this issue and you have a very valuable voice here as you work on two very famous libraries which will become one of the most used in the future. Thus, how about discussing it for 0.3? I'd like to propose a synchronous working session to discuss this and other topics (spatial index) for version 0.3. We did for 0.1 and was very productive. |
(early contributor on the initial development of geoparquet / geofeather support in GeoPandas here) I'm with @jorisvandenbossche on this one, and I think an optional metadata field to state the orientation (defaulting to unoriented / nonstandardized) is the way to go. Without getting on a rant that is not helpful to advancing the goals of this effort (fast I/O, broader compatibility), I'll summarize with saying that this non-optional requirement made it into 0.2.0 spec in a way that does not sufficiently support those goals when major components of the ecosystem (e.g., GEOS) have a different representation. It will be hard to undo that requirement without it being seen as a step backward; in the future I'd urge a lot of caution around adding non-optional requirements to the spec. Much easier to add those and eventually promote those to required as appropriate. Apologies for not engaging sooner (haven't followed this closely since it moved from initial work on geo-arrow-spec repo), and I know everyone means well so this is NOT intended to come across as harsh or personally critical. I appreciate the perceived benefits of not having to re-validate / re-orient winding order on read when inputs are known and trusted, but it incurs non-zero cost to achieve it uniformly. I think this is not a pay-it-once and all benefit proposition; this is a pay-it-once and possibly benefit proposition. Instead, the benefit is in stating the degree to which you can trust the inputs (metadata field value says oriented according to a standard); the same is true for geometric validity, NOT in trying to force nonzero-cost standardization (how high the cost depends on your native geometry representation). Since the underyling geometry storage (WKB) does not rigidly require winding order consistency (and in the wild cannot be trusted to be consistent), the metadata (spec) should not be the mechanism that tries to force it. Instead, it is for the underlying geometry storage representation to take this up (e.g., maybe GeoArrow geometry encoding is responsible for this?), and the role of the metadata is to state the degree to which geometries rigidly conform to defined standards. E.g., if GeoArrow encoding enforces consistent winding order during construction, then it is very easy to add a metadata value to indicate geometries are wound in expected order (or maybe it is simply redundant once you know the storage type); the cost is incurred by the geometry representation not the spec. Not saying that GeoArrow has to enforce winding order, just trying to make the point that if your representation takes a stance on it, it is achieved by that representation rather than the spec. Otherwise, I think you are adding impedence on writes that is counterproductive. Geoparquet / Geofeather grew out of early efforts to have much faster I/O with geo data using existing, widely used representations (WKB) than was available from native GIS formats within GeoPandas (was much easier than our larger efforts to add better vectorized I/O there). It would be a shame for the spec to undo those gains by requiring a standardization that does not yield uniform benefits purely for the sake of making the data easier to reason about. One of my most common use cases of feather files (same metadata spec as GeoParquet for now) is as intermediates within a consistent toolchain: read from GIS format, do stuff (using GEOS under the hood), write feather files ... later read feather files, do more stuff (again using GEOS), etc. No benefit but nonzero cost to enforcing winding order during write for each of those intermediates. I'd like to think that the benefits of having an as-fast-as-possible intermediate storage representation outweigh the benefits of strong consistency of representation (e.g., higher frequency of use between toolkits than within). Otherwise, use one of the other GIS formats already out there for portability, or take up standardization with up-and-coming representations (e.g., GeoArrow). If I needed to post data to long-term storage and distribution for others in other toolkits, being able to opt-in to doing this and stating the degree to which others can trust rather than check winding order, is the right way to go. I.e., incur the cost at the boundary for data sharing, not for internal use. Making this an optional field in the spec directly supports that. All that said, I really think this needs to be moved from required to optional in 0.3.0, and to strongly suggest a greater period of testing between promoting elements of the spec from optional to required. |
Thanks @brendan-ward, I think that perspective is really helpful. I think there are two different use cases here:
What would people think about a metadata field that specifies whether or not the winding order is known? That would let the user choose which of these use cases fits their needs better. If the user has a read-often use-case, they can tell the writer to fix winding order before writing. I suppose the worry here is that most writers don't implement the optional winding order fix?
👍 on this. I worry that moving too fast about the spec could fracture the ecosystem. |
Thanks for reframing the use cases @kylebarron I think instead of the frequency of read / write (because I have a lot of internal cases of your second bullet), I'd instead suggest these fall into two categories:
Standardizing winding order does not guarantee faster reads; that still depends on your toolkit. Instead, it's a benefit to specific users of specific toolkits: if the metadata tells you things are wound this way, you don't need to check and rewind them IF you would normally do so.
And also that it is a non-zero cost depending on your toolkit, and that your toolkit may then need to reverse it on read. Once such capabilities are broadly accessible and shown to have very little overhead, so that it is a drop-in for most writers of GeoParquet, that would be the time to consider promoting it from optional to required. |
I'm pretty convinced on making it optional, thanks for sounding in @brendan-ward and others. I think we were probably trying to move a bit too fast. But it sounds like the more conservative thing is actually to make it optional, and soon. I'd be for cutting a 0.3.0 release asap that just has one change - making it a field as @jorisvandenbossche proposes. If someone can make a PR I'm happy to review & merge (with other's reviews) and cut a release. |
Thanks for being willing to be flexible here! How about resolving that in an 0.2.1 release? That seems more in keeping with a "we weren't quite ready to flip that switch" than pushing an 0.3.0 to fix it. Though either way these are 0.x releases which hopefully have less long-lasting impacts than a major version release, and either way getting one out soon will close the window in which someone would implement this for 0.2.0 (though @rouault beat us to it!). What about instead of a binary |
I have no strong feelings on it. We're out of SemVer land since it's pre-1.0.0, so I don't think it really matters. 0.2.1 feels to me more like a 'typo' than a change in a field. But yeah, I agree these 0.x should not have long lasting impacts, and I think the most important thing is to get it out soon. I'm far from the expert on this stuff, but your proposal makes sense, and I like being more explicit / self-documenting, and probably more verbose, so +1, but curious for others to sound in. |
The explicitness of the winding order sounds good to me rather than a true/false approach. But should we really offer the two orientations ? Shouldn't that just be orientation="counterclockwise" ? It would be the first spec I can think of where, where the 2 orientations are possible. I don't have strong feeling though. I can understand that some writers/readers might really expect one or the other orientation to get maximum throughput. |
sorry, I was being unclear and conflating this being unset (no key present) with what it would equate internally. You're right, and explicit orientation="counterclockwise" or unset is also 👍 by me. You're right that having orientation="clockwise" may go too far in terms of supporting opposing orientations; I was thinking of it more as an opportunity for the writer to state what they know about the geometry orientation. So this could end up being more like
Might need to be expanded a bit to cover correct terms for ellipsoidal / spherical or 3D geometries; I'm unfamiliar with the correct terms there. |
Looks good to me! Feel free to make a PR and continue discussion there, feels like it's getting close. (Or I'm happy to write this up in a PR, but I like the traceability to who came up with it, and seeing lots of different people as explicit contributors to lines of the spec :) |
An explicit A remaining question is how this interacts with |
Thanks for the input everyone. I'm on board with adding My hope is that we can incentivize the usage of CCW orientation so that readers can optimize. We should try and avoid a situation where some organizations/users choose CW and some CCW, as such I would try to avoid allowing Regarding the interaction with |
I've put all these thoughts together in a PR, using the proposed wording by @brendan-ward. CC @rouault @kylebarron @brendan-ward (I couldn't add you as reviewers) |
I'll have to plead ignorance here, having not yet worked much with spherical coordinates and certainly not from the perspective of I/O. If it is true that the only way to represent spherical edges is using counterclockwise ordering, then this stipulation makes sense. If instead, this is based on design decisions of particular toolkits (e.g., S2) and not consistent in practice across all toolkits, then I don't think we can make this stipulation. The fallback would be the same for spherical coordinates as for planar coordinates, right? In that if you don't know for certain the winding order, you have to check and reorder as needed for your toolkit? (I don't know here) Given that, my instinct is to keep those decoupled from a specification standpoint, and instead let them be coupled in practice (e.g., if you output coords from S2 and know their orientation, it is easy to set this as well), otherwise this gets into the same potential issues that we're trying to avoid in enforcing planar coordinate winding order. Dumb question: are we able to use WKB as a storage for spherical coordinates as well? |
not really (but I'm only superficially aware of issues with spherical coordinates). My understanding is that with spherical coordinates is that you can't guess which "side" of the Earth is in the polygon. imagine a polygon whose coordinates are [[-180,60],[-90,60],[0,60],[90,60],[180 60],[-180,60]] : you don't know if it is meant to be be everything above 60 deg north (actually it is not strictly everything above 60 as the geodesics are not a consistent latitude, but whatever...), or everything below 60 deg north, if there's no winding order convention |
So the issue is that you simply cannot interpret the spherical coordinates without knowing the intent of the representation (i.e., based on winding order convention)? Thus - unlike planar coordinates where you can reorder based on the needs of a specific toolkit without losing the meaning of what is the exterior vs interior ring (even if ordered incorrectly) - spherical coordinates cannot be simply reordered to meet the needs of a toolkit because of these implications? Sorry, out of my domain here, likely to suggest bad ideas... I guess then the choice becomes whether or not |
For
So I would say we don't have to make it REQUIRED, but we do need to describe that lacking orientation flag, the polygons must be interpreted using smaller-of-the-two-possible-interpretations rule. |
playing devil's advocate: what about a polygon that divides the ellipsoid in 2 strictly equal parts (north hemisphere vs south hemisphere typically) ? |
That polygon becomes unrepresentable without orientation rule too, and should not be written out without oriented flag. As for the readers - garbage in, garbage out. If the creator of the file wrote garbage, there is not much reader can do, so I don't think the standard should care. |
Personally, I don't really like the "choose the smaller polygon" approach. How does one compare the two polygons, with what error margin? It feels error-prone and as mentioned means we cannot have polygons larger than the hemisphere. How about for |
Great discussion here. Just want to give a step back in my previous comment after reading all the arguments. I'm ok adding the optional orientation field. |
#74 has been merged, the idea is to cut the release on Monday end of the day (pacific) or Tuesday. If you've any comments around this would be great to share before that time. |
I think the standard should define polygon orientation.
1. Spherical edges case
With spherical edges on sphere, there is an ambiguity in polygon definition, if the system allows polygons larger than hemisphere.
A sequence of vertices that define a polygon boundary can define either polygon to the left of that line, or to the right of the line. E.g. global coastal line can define either continents or oceans. Systems that support polygons larger than hemisphere usually use orientation rule to solve this ambiguity. E.g. MS SQL, Google BigQuery interpret the side to the left of the line as the content of the ring.
2. Planar edges case
Planar case does not have such ambiguity, but it is still good idea to have specific rule.
E.g. GeoJson RFC defines a rule consistent with the rule above:
The text was updated successfully, but these errors were encountered: