Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQL2 JSON encoding #601

Closed
cportele opened this issue Jul 5, 2021 · 22 comments · Fixed by #655
Closed

CQL2 JSON encoding #601

cportele opened this issue Jul 5, 2021 · 22 comments · Fixed by #655
Assignees

Comments

@cportele
Copy link
Member

cportele commented Jul 5, 2021

CQL2 JSON encoding options

There are two proposals for the CQL2 JSON encoding. They are currently described here and here.

The first option is the current text in the draft standard and uses JSON objects for predicates and JSON arrays for operands.

The second option was developed during one of the OGC Code Sprints and uses JSON arrays only (except for the GeoJSON geometries). It is inspired by the expressions in the Mapbox Style specification. One effect of this is differences in how operators are named (e.g., "==" instead of "=", "all" instead of "and", "get" instead of "property").

The Standards Working Group (SWG) wants to adopt only one JSON encoding for CQL2 and is looking for feedback which one should be adopted. This comment provides a side-by-side comparison of four examples to get a quick overview. The complete set of examples is available here.

The SWG currently does not have good reasons for picking one option over the other. Both options seem similar with the differences listed above (use of objects vs arrays, differences in the names). We are interested in feedback, if there are preferences in terms of the style or good reasons for preferring one option, because it is easier to process or create.

If you prefer the first option, please add a "thumbs up" in this comment.

If you prefer the second option, please add a "thumbs up" in this comment.

Comments on specific advantages or disadvantages of the options are very welcome, too.

Example 1

scene_id = 'LC82030282019133LGN00'
Option 1 Option 2
{
   "=": [
      {
         "property": "scene_id"
      },
      "LC82030282019133LGN00"
   ]
}
[
   "==",
   [
      "get",
      "scene_id"
   ],
   "LC82030282019133LGN00"
]

Example 2

eo:cloud_cover BETWEEN 0.1 AND 0.2
   AND landsat:wrs_row=28
   AND landsat:wrs_path=203
Option 1 Option 2
{
   "and": [
      {
         "between": [
            {
               "property": "eo:cloud_cover"
            },
            0.1,
            0.2
         ]
      },
      {
         "=": [
            {
               "property": "landsat:wrs_row"
            },
            28
         ]
      },
      {
         "=": [
            {
               "property": "landsat:wrs_path"
            },
            203
         ]
      }
   ]
}
[
   "all",
   [
      "between",
      [
         "get",
         "eo:cloud_cover"
      ],
      0.1,
      0.2
   ],
   [
      "==",
      [
         "get",
         "landsat:wrs_row"
      ],
      28
   ],
   [
      "==",
      [
         "get",
         "landsat:wrs_path"
      ],
      203
   ]
]

Example 3

eo:instrument LIKE 'OLI%' AND S_INTERSECTS(footprint, POLYGON((43.5845 -79.5442, 43.6079 -79.4893, 43.5677 -79.4632, 43.6129 -79.3925, 43.6223 -79.3238, 43.6576 -79.3163, 43.7945 -79.1178, 43.8144 -79.1542, 43.8555 -79.1714, 43.7509 -79.6390, 43.5845 -79.5442)))
Option 1 Option 2
{
   "and": [
      {
         "like": [
            {
               "property": "eo:instrument"
            },
            "OLI%"
         ]
      },
      {
         "s_intersects": [
            {
               "property": "footprint"
            },
            {
               "type": "Polygon",
               "coordinates": [
                  [
                     [43.5845,-79.5442],
                     [43.6079,-79.4893],
                     [43.5677,-79.4632],
                     [43.6129,-79.3925],
                     [43.6223,-79.3238],
                     [43.6576,-79.3163],
                     [43.7945,-79.1178],
                     [43.8144,-79.1542],
                     [43.8555,-79.1714],
                     [43.7509,-79.6390],
                     [43.5845,-79.5442]
                 ]
              ]
            }
         ]
      }
   ]
}
[
   "all",
   [
      "like",
      [
         "get",
         "eo:instrument"
      ],
      "OLI%"
   ],
   [
      "s_intersects",
      [
         "get",
         "footprint"
      ],
      {
         "type": "Polygon",
         "coordinates": [
            [
              [43.5845,-79.5442],
              [43.6079,-79.4893],
              [43.5677,-79.4632],
              [43.6129,-79.3925],
              [43.6223,-79.3238],
              [43.6576,-79.3163],
              [43.7945,-79.1178],
              [43.8144,-79.1542],
              [43.8555,-79.1714],
              [43.7509,-79.6390],
              [43.5845,-79.5442]
          ]
        ]
      }
   ]
]

Example 4

T_AFTER(built, DATE("2012-06-05"))
Option 1 Option 2
{
   "t_after": [
      {
         "property": "built"
      },
      {
         "date": "2012-06-05"
      }
   ]
}
[
   "t_after",
   [
      "get",
      "built"
   ],
   [
      "date",
      "2012-06-05"
   ]
]
@pvretano
Copy link
Contributor

Current CQL examples re-encoded using @tschaub proposed JSON encoding created here. Hopefully I encoded them correctly! ;)

@cportele
Copy link
Member Author

Meeting 2021-08-30: How do we get enough feedback / opinions which JSON encoding is easier to parse / generate / validate? @cportele will create a side-by-side comparison on key examples in this issue and then we can ask for comments / votes with thumbs-ups for each of the two options. Promote this at the member meeting and review the results in the meeting in 4 weeks.

@cportele cportele self-assigned this Aug 30, 2021
@cportele cportele changed the title CQL JSON encoding CQL2 JSON encoding Sep 13, 2021
@cportele
Copy link
Member Author

cportele commented Sep 13, 2021

There are two proposals for the CQL2 JSON encoding. See the first comment in this issue.

If you prefer the first option please add a "thumbs up" to this comment.

@cportele
Copy link
Member Author

There are two proposals for the CQL2 JSON encoding. See the first comment in this issue.

If you prefer the second option please add a "thumbs up" to this comment.

@m-mohr
Copy link
Contributor

m-mohr commented Sep 13, 2021

IMHO this issue doesn't give enough context.

  • Are there any known (dis)advantages that the options have?
  • I'm especially wondering what the excessive use of "get" is useful for in Option 2?
  • Why is it "=" option 1 and "==" in option 2? Are there any other of those tiny differences that are not reflected here?
  • Is there any advantage aligning with the Mapbox Style specification? To me option 1 seems easier overall.

So right now voting for option 1 because it's the only one that I understand just by reading it.

PS: Example 1 - Option 2 seems broken.

@cportele
Copy link
Member Author

@m-mohr - Thanks for the feedback.

I have added an explanation to address your items 2 and 3 (this is to align with the Mapbox Style expressions which uses different tokens/names). The advantage for aligning with Mapbox Style could be the reuse of existing code for parsing or generating such expressions or the fact that these expressions use a familiar style. But of course, this applies only to those that have been or are working with those styles.

As for other (dis)advantages of the options, we do not really have a clear view, which is one of the reasons for this issue and the request to look for broader feedback. I have implemented option 1, but parsing/processing option 2 expressions should work more or less the same way. I have also added a bit of text about this.

I have also fixed the property name in example 1 - option 2.

@jerstlouis
Copy link
Member

Big preference for Option 1 on my side.
My understanding is that the whole point of having a JSON encoding for CQL2 (I would otherwise argue one is not needed) is that it does not require any special parsing and is an as-direct-as-possible object encoding of the CQL2 Abstract Syntax Tree.

Converters between MapboxGL style expressions can be written for those who would like this convenience.

In general I find the MapboxGL styles syntax extremely counter-intuitive, and the left-side/right-side expression rules (e.g. what is allowed on which side, and when is GET required or not) for different versions make things even more complicated.

@philvarner
Copy link
Contributor

One thing I notice is that the difference between 1 & 2 conflate two different design decisions. Option 1 reuses the keywords (e.g., "and" is conjunction) from CQL Text and puts them in a JSON structure that feels pretty standard to me, whereas Option 2 changes the keywords (e.g., conjunction is "all" instead) and provides a JSON structure that is more LISP-like than I think is typical for "mainstream" use.

I'm strongly in favor of Option 1 between these two options, but I would be more amenable to Option 2 if it used the same keywords as CQL Text instead of the MapboxGL terms.

@jisantuc
Copy link

Parsing mapbox style expressions into data is really rough -- especially with the nesting rules (e.g. you can't ["get", ["get", "lookupKey"]]). Even with TypeScript support, correct non-trivial mapbox style expressions are (for me at least) pretty hard to author. It's way easier to imagine a schema for Option 1. Moreover, I think heterogeneous lists where the structure is encoded in array indices are a design smell -- there's obviously structure / semantics, but e.g. "an array where the first element is a string and the second is a double" rarely expresses the structure / semantics at all.

@tschaub
Copy link
Contributor

tschaub commented Sep 16, 2021

I'm not really trying to lobby for Option 2 over Option 1 – I can tell that Option 2 is not popular with this crowd. But, I will drop a few comments anyway.

I assume that anyone who regularly works with JSON would see an object like {"or": ...} and wonder what it is supposed to mean of the object has more than one member like {"or": ..., "and": ...}.

Since JSON objects are unordered collections of one or more name/value members, there is no "first" member name. While the first object above might make for a nice (human-)readable "OR expression," the second object is ambiguous (OR?, AND?, maybe an implicit OR with a child OR and AND?).

I know that a validator can detect that the second form is invalid, but to me, using an object to represent a single name/value pair is JSON-smell. And if I see code that switches on the "first" member name in an object (to determine the expression type), that would strike me as code smell.

Disparaging remarks aside, I do see how {"or": ...} in JSON is tantalizingly similar to <Or ...> in XML. So I can understand how it seems like a nice incremental upgrade. And even if people don't have any exposure to OGC Filter Encoding, I also get how the LISP-like syntax of Option 2 is off-putting. All I can say is that after having used it for style expressions in other projects for a while, it has grown to be a syntax that feels nice to me – but I have no idea if others would find the same.

It is probably fair to say that the "type-safeness" of data in the two options is equally mediocre. But having written a parser for the second option in a type safe language, I can say that it wasn't hard.

@jerstlouis
Copy link
Member

jerstlouis commented Sep 16, 2021

@tschaub I share some of your concerns for Option 1, and I was originally wondering where Option 3 was :)
I believe the intent is that there is always a single key inside those Option 1 objects (that should be clearly explained, if that is the case).
So in effect the key is the operator, and the value is the operands.

Personally, I would have preferred something that is even closer to the AST, e.g.:

{
   "operator": "and",
   "operands" : [
      {
         "operator" : "between",
         "operands" : [
            {
               "property": "eo:cloud_cover"
            },
            0.1,
            0.2
         ]
      },
      {
         "operator" : "=",
         "operands": [
            {
               "property": "landsat:wrs_row"
            },
            28
         ]
      },
      {
         "operator" : "=",
         "operands" : [
            {
               "property": "landsat:wrs_path"
            },
            203
         ]
      }
   ]

It could have used "op" instead of "operators", and "values" instead of "operands" as well for a more concise syntax.

@tschaub
Copy link
Contributor

tschaub commented Sep 16, 2021

@jerstlouis - I agree that if you want to use JSON objects to represent an expression, it makes sense to have operator or type as a member to switch on. Though I would choose a different style for the accessor expression (your {"property": ...}) – an operator (or type) value of get makes sense to me there.

Then if you wanted something a bit less verbose, maybe you can imagine using positions in an array instead of named members in an object. So instead of expr.operator you would have expr[0] and instead of expr.operands you would have expr[1:] (or expr.slice(1)). And then you are at Option 2 😃

@jerstlouis
Copy link
Member

jerstlouis commented Sep 17, 2021

Though I would choose a different style for the accessor expression (your {"property": ...}) – an operator (or type) value of get makes sense to me there.

I don't really see properties as an operation expression, they're more like identifier expressions.

But yes having something consistent to switch on like "type" might be good... "type" is already used for GeoJSON geometry, so if there is no overlap with "property" and the operators perhaps it's a good idea? I would prefer property to get though, e.g.

{ "type" : "property", "id" : "eo:cloud_cover" }

but that becomes a bit verbose...

maybe you can imagine using positions in an array instead of named members in an object. So instead of expr.operator you would have expr[0] and instead of expr.operands you would have expr[1:] (or expr.slice(1)).

That is the part I strongly dislike :) It was extremely difficult to parse and then work with that data model in our MapboxGL parser, where elements of the array are all sort of different things (operator type vs. value operands are very different kind of objects):

   Array<MBGLFilterValue> params = filter.type.type == blob ? (Array<MBGLFilterValue>)filter.b : null;
   String op = params && params.count && params[0].type.type == text ? params[0].s : null;
   if(op && params.count >= 2)
   {
      if(!strcmpi(op, "get"))
      {

@cportele
Copy link
Member Author

Meeting 2021-09-27: From the two options there is a preference for the first option. However, there is value in considering the issues raised in the last comments and come back to this issue at the next meeting in two weeks and after we got more input from others that might use CQL2 in the future.

@philvarner - Any view from the STAC perspective on @jerstlouis proposal in #601 (comment), which is a variation of option 1 that is more verbose, but closer to the abstract syntax tree?

In any case we need to ensure that in option 1 { "and": [ ... ] , "or" : [ ... ] } is not valid (add additionalProperties: false and add text in the document).

@philvarner
Copy link
Contributor

I don't have a strong opinion between them, and I think there's clear difference wrt STAC. Personally, I dislike when the "value" is used as an attribute name (as in option 1) and prefer when the value is an attribute value (as in jerstlouis's proposal). It's more verbose, but also has a more consistent structure.

@m-mohr
Copy link
Contributor

m-mohr commented Sep 28, 2021

Yes, I agree that @jerstlouis proposal seems best to me, too. That's most verbose, but can be created and transformed most easily, IMHO. Which would also be nice for openEO use cases.

@cportele
Copy link
Member Author

Meeting 2021-10-11: Based on the feedback we tend to select the modified option 1 (we may pick difference keys than "operator" and "operands"). We will make a decision in the meeting on October 25.

@cportele
Copy link
Member Author

Meeting 2021-11-25: No comments were made, so @pvretano will work on a pull request based on the decision. We also discussed whether we should follow the same approach for the property we will implement the decision. We will use "op" instead of "operator" and "args" instead of "operands". There was a discussion about the encoding of "property", "function", etc. Right now Peter uses the following approach in the draft PR:

  • { "op" : "s_intersects", "args" : [ "myarg1", 123 ] }
  • { "op" : "in", "args" : [ { "property" : "prop-name" }, [ 123, 456 ] ] }
  • { "property" : "prop-name" }
  • { "function" : "my-func", "args" : [ "myarg1", 123 ] }

We will discuss "property" etc. once the PR is ready for review.

@philvarner
Copy link
Contributor

This is great. I'm working on updating the STAC examples now.

I just had one question -- is it correct that CQL2 Text only support symbolic comparison operators (e.g., >=), but that CQL2 JSON supports both symbolic and alphabetic (e.g., >= or gte)?

@pvretano
Copy link
Contributor

@philvarner that is correct BUT I would prefer to align the text and JSON to use the symbolic operators. I'll create a PR for that for the SWG to consider at the next meeting.

@philvarner
Copy link
Contributor

Thanks @pvretano -- I'll reflect that in the STAC spec, and can always add the alphabetic names later if they are retained.

@philvarner
Copy link
Contributor

My updated examples for the STAC API Filter Extension are here: https://github.com/radiantearth/stac-api-spec/blob/f5da775080ff3ff46d454c2888b6e796ee956faf/fragments/filter/README.md
in this PR radiantearth/stac-api-spec#225
if anyone would care to look at them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

7 participants