Filter Extension #128

philvarner · 2021-05-10T21:45:43Z

Related Issue(s): #32

Proposed Changes:

Defines Filter Extension for allowing CQL queries.

Non-changes

@cholmes 's original PR removed intersects. I think that's a separate concern and should be in a different PR (and needs a lot of feedback and discussion)

Open Issues / Questions

The OpenAPI spec for it still has some validation problems, but it gets the correct ideas across.
How do the conformance classes relate to implementing filter on both /collections/{collectionId}/items and /search? Are both required?

PR Checklist:

This PR is made against the dev branch (all proposed changes except releases should be against dev, not master).
This PR has no breaking changes.
I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.

jisantuc · 2021-05-18T19:30:36Z

I've been working on an implementation in Franklin using only CQL JSON and here are my comments so far:

Strange mix of types?

One thing that I keep tripping on is a pretty strange mixture of types. An expression for filtering a field is a single key at the root of an object, followed by one of a map also holding a single key, or an array of maps holding a single key. This singularity strikes me as odd, because objects can hold many keys, so if there's only supposed to be one thing, why does the data structure in principle hold more than one? Similarly, for or and and, the value behind the key is supposed to be, e.g.:

[
  {"eq": ...},
  {"neq": ...}
]

These string keys in a by-default plural structure are much less convenient to work with than an alternative in which they live under a kind key in each json object, like:

[
  {"kind": "eq", ...},
  {"kind": "neq", ...}
]

I think we're getting this from the OGC spec itself, so there might not be much we can do about it, but it's pretty weird to program with. I've been fighting a JSON round trip test for about 90 minutes while tabbing back and forth to the spec to try to get this to cooperate, and a better matching of types to what they represent would have made this part much easier.

`queryables` seem pretty tough to know ahead of time

This is becoming a recurring theme with the open-endedness of the spec, but it's not obvious to me what I'm supposed to do for queryables in a server implementation when the server will serve other people's data. It's also not clear to me whether queryables is required. I can accumulate item properties as someone fills Franklin's database with items (the dynamic case mentioned in the spec), but then it's not obvious to me what "queryable" means -- does it mean it's technically possible to query based on x, or that querying based on x will be fast? If the latter, then I need to expose some way for users to control what's queryable and what isn't, since Postgres JSONB indexing is really effective for equality at a key filters but pretty much worthless otherwise. There's a related issue for this in Franklin from before the CQL work here azavea/franklin#616 -- basically there's a trade-off in what fields we can search efficiently and index size / cost of inserting items, and it's hard to make a good choice about that without knowing what promise "x is queryable" represents.

It's also pretty difficult in the dynamic case to figure out what to do with fields that don't have a JSON schema. Let's say someone posts an item that has a string value in mySpecialField -- if I'm dynamically aggregating queryables, I don't know that all myVerySpecialField values are strings, just that I happen to have seen one.

Overall though pretty solid 👍🏻

This isn't really that different technically from the previous query extension in its implementation for us -- build up the cases that something of the type CQLFilter can have, destructure with a fold or a match to whatever the CQLFilter has to become. A bunch of stuff was copy-pastable from the query extension implementation we had in Franklin.

Random notes

I appreciated the clarification that "POST with a JSON body to the Items resource is not supported, as POST is used by the Transactions Extension for creating Item objects." Since I'm working 100% on the JSON side of things that was going to come up for me eventually, and when I got to thinking about it I saw it was already there. 💯

It wasn't too difficult I think to implement the RHS property name lookup Postgres. One thing that wasn't clear to me was whether the properties in any part of the queries were only top-level properties or could drill into the JSON object. I acted as if they could access nested fields, but only if it's strictly through a hierarchy of objects. Something I think would be a cool capability would be having some kind of JSON / textual representation of object traversal (in the traversal systems / optics sense) that would allow quite precise querying, e.g. "the name of the seventh band in eo:bands is aerosol", but that's way out of scope here 🤷🏻‍♂️

m-mohr · 2021-05-18T19:35:01Z

@jisantuc This is great feedback, but it seems most should be actually be pointed towards OGC API - Features so that they may improve things depending on your feedback. :-)

jisantuc · 2021-05-18T19:37:00Z

I was asked for comments on the PR via email so I put them here first

philvarner · 2021-05-18T19:55:11Z

Queryables

The spec uses the words that the queryables "may" and "can" be used in filter expressions, which I interpret to mean that using a parameter name that's not in queryables should result in an error. Maybe one recommendation would be to allow wildcards, like properties.*, as queryables, so a user could query for any value in Properties? Or maybe there's a wildcard option * for queryables that indicates that the filter is allowed to use property names not defined explicitly as queryables, but there's no guarantee they'll work (e.g., if I query foobar = 1 and foobar doesn't exist, I get a successful 0 results).

philvarner · 2021-05-18T19:59:07Z

One thing that wasn't clear to me was whether the properties in any part of the queries were only top-level properties or could drill into the JSON object.

You could do this with queryables -- a queryable foo that actually maps to properties.bar.baz.foo

I acted as if they could access nested fields, but only if it's strictly through a hierarchy of objects. Something I think would be a cool capability would be having some kind of JSON / textual representation of object traversal (in the traversal systems / optics sense) that would allow quite precise querying, e.g. "the name of the seventh band in eo:bands is aerosol", but that's way out of scope here 🤷🏻‍♂️

Yes, I think we have many use cases like this, particularly eo:bands, as users search multiple products for assets that have the spectral bands they need. Might be something GraphQL could express well.

philvarner · 2021-05-18T20:09:46Z

Strange mix of types?

cql-text is pretty nailed down, since they're just formalizing something that's been used in practice for a while. But CQL JSON is new, and I think they're pretty open to suggestions still, particularly if they're related to implementation difficulties. I didn't look at it too closely because it looked good enough, and no one is going to actually be manually constructing these queries anyway. Also, I feel Scala exposes these type issues a lot more than other common languages, since there's just more correct automatic behavior if you can get the types structured right.

cholmes

Looking good! Just put up a bunch of review comments. I think the biggest ones are:

I think we should have two conformance classes - one for item-search, and one for ogcapi-features. The latter we just use OGC's class directly, and the other we define our own class (which I think you already did). But let people choose one or the other or both. Some people implement 'item-search' without ogcapi-features, so we should give them the option to just do the item-search CQL. I think this approach should simplify a few bits, as the fragment can talk through the common stuff, and then do more specifics in each section (like cross collection search in item search). I noted in comments that that was the intent behind the fragments, to define conformance classes in multiple places, but we never executed on it. fields and sort should do that next - like use them in ogcapi-features under stac conformance namespace (until we manage to fully align)
Clarity on what to do with the fact the spec may change with more conformance classes. This is mentioned a good bit, but I think we should try to make the recommendation of what to do right now clear. I think it's to use their conformance classes but it's ok to not fully implement everything, and in a future version there will be the right conformance classes for more of a subset.
Queryables - it reads now like there's potentially a lot of work to do here. Should focus on what the minimal requirements are, and make it clear for the STAC content. I think the examples helped make it a bit more clear to me, but we should just structure for an easy on ramp for new implementors, where they can just follow some simple recommendations and be 'good enough'.

fragments/filter/README.md

cholmes · 2021-05-18T22:40:53Z

I saw @jisantuc's review come in as I was most of the way through mine, so there's some overlapping stuff, but I didn't go back and tweak them all. It seems like he had some overlapping thoughts/questions on queryables, and Phil answered some of them:

The spec uses the words that the queryables "may" and "can" be used in filter expressions, which I interpret to mean that using a parameter name that's not in queryables should result in an error. Maybe one recommendation would be to allow wildcards, like properties.*, as queryables, so a user could query for any value in Properties? Or maybe there's a wildcard option * for queryables that indicates that the filter is allowed to use property names not defined explicitly as queryables, but there's no guarantee they'll work (e.g., if I query foobar = 1 and foobar doesn't exist, I get a successful 0 results).

Something in those ideas to add flexibility would make sense to me, but not quite sure how either of those would work - may need to see some examples. But thinking about the easy 'on-ramp' for a STAC developer, it'd be useful if we could just provide them with almost a 'static' queryables document that they could just use. At the very least it could define datetime, geometry and collection. It seems like it'd be nice if we could also define common metadata and stable extensions as 'queryables' in some static way. But I'm not sure how that plays with OGC if the data doesn't actually have that field.

philvarner · 2021-05-19T00:47:39Z

@cholmes @jisantuc I just filed this issue with OGC opengeospatial/ogcapi-features#582

On the one hand, I like the tightness of having queryables defined and allowing that type of dynamic discovery, but on the other hand, it severely complicates and/or restricts filtering when the Items have arbitrary values in their properties. I think as OGC has defined it, it's too tight to work in many situations, and there needs to be a way for the client and/or server to determine the semantics of what they allow/deny.

fragments/filter/README.md

extensions.md

…about cross-collection use of 'collection' in filter

lossyrob

Really great stuff - represents a lot of detailed work, really appreciate that you're sorting through this.

One high level comment is that you may want to consider splitting up the README into either separate Markdown documents, pulling some of the justification/point-in-time-specific information into the PR description, and/or ongoing issues or TBDs into GitHub Issues. This might make reading easier for users who want to grok what this extension is quickly, as well as make this document more maintainable as things continue to evolve

item-search/README.md

fragments/filter/README.md

lossyrob · 2021-05-21T17:08:43Z

fragments/filter/README.md

+known as Common Query Language) or the [Cassandra Query Language](https://cassandra.apache.org/doc/latest/cql/) used by the 
+Cassandra database.
+
+## Limitations of Item Search 


This section is more a justification for the extension, and perhaps doesn't need to be so prominent. It was a little confusing to me reading through the spec as I was expecting this section to begin talking about what this extensions supports. I'd recommend making a section lower down dedicated to talking about other parts of the spec in comparison or justifying this extension, and have the beginning of this document focused on specifying what the filter extension is and does.

I'll think about that. I like having the "motivation" section first, to adequately set the stage for where we're starting with this extension.

lossyrob · 2021-05-21T17:15:06Z

fragments/filter/README.md

+
+These projects have or are developing CQL support:
+
+- [GeoPython PyCQL](https://github.com/geopython/pycql/tree/master/pycql), and the 


Should https://github.com/geopython/pygeofilter be used instead, as it seams to be more aligned with the current OAFeatures Filtering spec? This is a bit of a guess, as I haven't grokked both libraries completely, but this seems to be under more active development

Thanks for the pointer to that. I added in pygeofilter, but I also don't know what either provides. Looks like they do have a cql-json impl, whereas I think pyCQL is only doing cql-text right now.

fragments/filter/README.md

philvarner · 2021-05-25T00:14:25Z

Really great stuff - represents a lot of detailed work, really appreciate that you're sorting through this.

One high level comment is that you may want to consider splitting up the README into either separate Markdown documents, pulling some of the justification/point-in-time-specific information into the PR description, and/or ongoing issues or TBDs into GitHub Issues. This might make reading easier for users who want to grok what this extension is quickly, as well as make this document more maintainable as things continue to evolve

I agree, it's so large with all the examples that it's a lot to process at once, and breaking it up would be good.

philvarner · 2021-05-26T14:07:15Z

Filed these for follow-up work, to reorganize the doc (based on @lossyrob feeback) and provide better guidance on queryables.

philvarner · 2021-05-26T14:09:03Z

I'd like to merge this unless anyone has any major objections -- I think any further work would be better done as individual PR changes rather than adding to this already-massive one.

m-mohr · 2021-05-26T14:43:11Z

Unfortunately, I don't have the time right now to review it so leave it up to others. Will just do issues after the merge if something comes up...

initial draft for Filter extension

6d51718

philvarner mentioned this pull request May 11, 2021

Align to OGC CQL #51

Closed

3 tasks

philvarner added 10 commits May 10, 2021 20:43

filter extension: update text with a few missing details

3f51664

add more details around implementation requirements

1d32f24

draft openapi spec

ae9db54

update examples, and implementation section

26d22ce

remove stray comma

facfd04

wordsmith text

e5031e8

wordsmith text

d22cda2

wordsmith text

c810d2c

improve openapi definition

7edc330

add text about synthetic queryables

eba4135

philvarner marked this pull request as ready for review May 14, 2021 17:00

Merge branch 'dev' into filter-extension

7913862

philvarner changed the title ~~Initial draft of Filter extension~~ Filter Extension May 14, 2021

philvarner requested review from m-mohr, cholmes, jisantuc and matthewhanson May 14, 2021 17:13

philvarner added 3 commits May 14, 2021 14:45

add better example, fix linting error in markdown and schemas

e8490d9

fix ref to cql yml

518184f

minor fixes for filter openapi

2bdf845

jisantuc mentioned this pull request May 18, 2021

Comments on Part 3 opengeospatial/ogcapi-features#575

Closed

jisantuc mentioned this pull request May 18, 2021

CQL Support azavea/franklin#750

Draft

1 task

cholmes reviewed May 18, 2021

View reviewed changes

Merge branch 'dev' into filter-extension

0b115eb

clarification on queryables, line breaks, wordsmithing

e372618

m-mohr reviewed May 19, 2021

View reviewed changes

fragments/filter/README.md Outdated Show resolved Hide resolved

philvarner added 2 commits May 19, 2021 12:19

fix formatting

9f9a3dd

redefine conformance classes

7265eb8

philvarner commented May 20, 2021

View reviewed changes

extensions.md Outdated Show resolved Hide resolved

philvarner added 8 commits May 19, 2021 21:24

change conformance class names back to item-search#

820a18e

tweak language around proprietary extensions in queryables, add note …

c201898

…about cross-collection use of 'collection' in filter

update list of conformance class uris in extension page

96696c4

add conformance class field to each extension

90c88da

fix formatting of filter conformance classes

61a1f36

fix formatting typo

1fb71cb

clarify queryables

9acf92d

add a getting started section

b9b7c83

lossyrob reviewed May 21, 2021

View reviewed changes

modify text based on comments from Rob

8e08794

philvarner requested review from cholmes, m-mohr and lossyrob May 26, 2021 14:09

cholmes approved these changes May 26, 2021

View reviewed changes

philvarner merged commit ed7f3f1 into radiantearth:dev May 26, 2021

philvarner deleted the filter-extension branch January 31, 2023 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter Extension #128

Filter Extension #128

philvarner commented May 10, 2021 •

edited

Loading

jisantuc commented May 18, 2021

m-mohr commented May 18, 2021

jisantuc commented May 18, 2021

philvarner commented May 18, 2021

philvarner commented May 18, 2021

philvarner commented May 18, 2021

cholmes left a comment

cholmes commented May 18, 2021

philvarner commented May 19, 2021

lossyrob left a comment

lossyrob May 21, 2021

philvarner May 25, 2021

lossyrob May 21, 2021

philvarner May 25, 2021

philvarner commented May 25, 2021

philvarner commented May 26, 2021 •

edited

Loading

philvarner commented May 26, 2021

m-mohr commented May 26, 2021


		These projects have or are developing CQL support:

		- [GeoPython PyCQL](https://github.com/geopython/pycql/tree/master/pycql), and the

Filter Extension #128

Filter Extension #128

Conversation

philvarner commented May 10, 2021 • edited Loading

jisantuc commented May 18, 2021

Strange mix of types?

queryables seem pretty tough to know ahead of time

Overall though pretty solid 👍🏻

Random notes

m-mohr commented May 18, 2021

jisantuc commented May 18, 2021

philvarner commented May 18, 2021

philvarner commented May 18, 2021

philvarner commented May 18, 2021

cholmes left a comment

Choose a reason for hiding this comment

cholmes commented May 18, 2021

philvarner commented May 19, 2021

lossyrob left a comment

Choose a reason for hiding this comment

lossyrob May 21, 2021

Choose a reason for hiding this comment

philvarner May 25, 2021

Choose a reason for hiding this comment

lossyrob May 21, 2021

Choose a reason for hiding this comment

philvarner May 25, 2021

Choose a reason for hiding this comment

philvarner commented May 25, 2021

philvarner commented May 26, 2021 • edited Loading

philvarner commented May 26, 2021

m-mohr commented May 26, 2021

philvarner commented May 10, 2021 •

edited

Loading

`queryables` seem pretty tough to know ahead of time

philvarner commented May 26, 2021 •

edited

Loading