Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browseable recommendations and Children conformance class #229

Merged
merged 34 commits into from
Jan 5, 2022

Conversation

philvarner
Copy link
Collaborator

@philvarner philvarner commented Nov 8, 2021

Related Issue(s): #17 #159 #138 #137 #230

Proposed Changes:

  1. Better description of what the Core conformance class means
  2. Detailed descriptions of how to use sub-catalogs for browse
  3. STAC API - Children conformance class to get all child Catalog and Collection metadata in one call

PR Checklist:

  • This PR is made against the dev branch (all proposed changes except releases should be against dev, not master).
  • This PR has no breaking changes.
  • This PR does not make any changes to the core spec in the stac-spec directory (these are included as a subtree and should be updated directly in radiantearth/stac-spec)
  • I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.

@philvarner philvarner changed the title Browseable recommendations Browseable recommendations and Children conformance class Nov 8, 2021
core/README.md Outdated
@@ -62,24 +72,66 @@ A `service-doc` endpoint is recommended, but not required.
| ------------- | ----------- | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `service-doc` | `/api.html` | OAFeat OpenAPI | An HTML service description. Uses the `text/html` media type to refer to a human-consumable description of the service. The path for this endpoint is only recommended to be `/api.html`, but may be another path. |

Additionally, `child` relations may exist to individual catalogs and collections.
Additionally, `child` relations may exist to child Catalogs and Collections and `item` relations to Items. These
relations form a directed acyclic graph that supports browseable traversal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm throwing this comment here because it's the first mention of the acyclicality of the graph -- I don't think this is technically correct. I think the existence of parent links means that the graph has cycles. Moreover, I think it's good that the graph has cycles -- it means we can get back up a level without using the back button / that we can retrace our steps using only the data. Is acyclicality an important property? I've thought about this a bit and I'm not sure what it gets us. I think this is just a normal directed graph.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so that's a good point and one I should be more clear on. I was primarily thinking about the graph of child links, which must be a DAG. Also to make it clear that these are DAGs and not only trees, since the subcatalogs don't have to be distinct (e.g., you can slice them up by date or grid id, not only one)

Copy link
Contributor

@jisantuc jisantuc Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that even necessarily the case (that child links are acyclic)? I doubt anyone has mutual child relationships, but:

  • it's not explicitly disallowed
  • I'm not aware of any notes recommending against multi-part cycles like catalog a has item a as a child, item a has catalog b as a child (nothing says items can't have children as far as I know), catalog a is a child of catalog b

I have no idea why anyone would do such a thing, I just read a nice Alloy 6 tutorial earlier today and it might have poisoned my brain forever.

Either way, unless the acyclicality is important for some reason, I'm not sure it's doing much work except encouraging thinking about edge cases where it doesn't hold -- I think a plain directed graph should be sufficient for supporting browseable traversal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jisantuc here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the language to only be directed graph

@m-mohr
Copy link
Collaborator

m-mohr commented Nov 27, 2021

Would it make sense to split the Children conformance class and the rest of the PR into two PRs? This PR is so large that it's nearly impossible for me to really do a full PR in one go... Also, could it be that this PR also includes some other changes? For example, I've also seen pagination changes in here...

PRINCIPLES.md Outdated Show resolved Hide resolved
ogcapi-features/README.md Outdated Show resolved Hide resolved
core/README.md Outdated Show resolved Hide resolved
- **Extension [Maturity Classification](../extensions.md#extension-maturity):** Pilot
- **Dependencies**: [STAC API - Core](../core)

A STAC API can return information about all STAC [Catalogs](../stac-spec/catalog-spec/catalog-spec.md) available using a link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused if the server is expected to return just the first generation of children, the last generation of children, or the entire ancestry. For example, if I had Landsat catalogs organized as /catalogs/landsat_8_c1/{path}_{row}_{date}, the /children endpoint could:

  1. Return all {path} catalogs (immediate children).
  2. Return every combination of {path}_{row}_{date} (last generation of children).
  3. Return every combination of {path}, {path}_{row}, and {path}_{row}_{date} (entire ancestry).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... is this a conformance class a "simple" way for providers to express their "preferred list of catalogs and collections to show on a frontpage"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer @geospatial-jeff 's question, it's up to the implementer. I think typically it will return exactly the same set of children that have child relations from the root. The benefit to having it at this endpoint is that the title, desc, etc can all be returned, so a client doesn't have to retrieve every single child link just to find out the title to display -- this was Rob's use case.

Maybe "simple" isn't the right word. I'll think about this.

A STAC API can return information about all STAC [Catalogs](../stac-spec/catalog-spec/catalog-spec.md) available using a link
from the landing page that uses the link relation `children`, which links to an endpoint called
`/children`. The purpose of this endpoint is to present a single resource from which clients can retrieve
all the child objects (Catalogs and Collections) of a Catalog. This eliminates then need for a client to
Copy link

@geospatial-jeff geospatial-jeff Nov 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that returning both Catalogs and Collections in the /children endpoint is confusing. My understanding after reading this PR is that collections are for searching while catalogs are for browsing. It even says in the Browseable Catalogs best practices that collections should not be included as part of the browseable tree of catalogs. Allowing collections to be used for browsing is poor separation of concerns, and instead of just recommending against their use the spec should forbid using collections in this way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a bit confusing. Like do I not need to retrieve /collections anymore if /children is implemented? How should clients handle all the duplication to show a list of unique catalogs/collections?

Copy link
Collaborator

@lossyrob lossyrob Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the benefit of having children return both Catalogs and Collections - it more fits with the STAC core specification which uses rel=child for both of those cases. It also allows for all of the catalogs and collections to be retrieved in a single paginated call - useful for rendering the content of the API.

One addition we could make is that a query parameter can determine which type to return - e.g.

https://stac.api/children?type=catalog

Where is the best practices text you referenced located?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's included as part of https://github.com/philvarner/stac-api-spec/blob/catalogs-and-browseable/core/README.md#browseable-catalogs introduced in this PR. The exact language is:

These are the two standard ways of structuring a browseable tree of catalogs, the only difference being whether the Collection is used as part of the tree or not:

  • Catalog (root) -> Catalog* -> Item (recommended)
  • Catalog (root) -> Collection -> Catalog* -> Item

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I explicitly stated it anywhere, but the intention was that /children would return the same list of entities that are linked to via rel=child from the root. The benefit (as Rob has articulated before) is that a client can get all of the entities with one call instead of having to make one HTTP request for each one to, say, only get the title of the collection or catalog.

@@ -0,0 +1,156 @@
# STAC API - Children

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are catalogs returned by this endpoint allowed to implement item search (i.e. can they have a /search endpoint)?

Copy link

@geospatial-jeff geospatial-jeff Nov 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The landing page is just a catalog, and itself implements /search, so I'm guessing this is allowed. And the /search response would only return items that are contained by the catalog (potentially across many collections).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, any (sub) Catalog can implement a search endpoint.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. I thought I'd explicitly mentioned this, but I can't find it, so I'll make sure to add something.

@geospatial-jeff
Copy link

The top level README (https://github.com/philvarner/stac-api-spec/tree/catalogs-and-browseable#in-this-repository) should be updated to link to the children folder w/ description.

@lossyrob
Copy link
Collaborator

lossyrob commented Nov 29, 2021

Some comments talking through this PR at with Matthias Matt and Chris:

  • The /children endpoint should contain the STAC Objects for any directly linked Catalogs or Collections (not recursive)
  • Catalogs in a STAC API should not contain Items (as a best practice)
  • Collections in a STAC API should not contain any children (as a best practice)
  • One question that came up - should the /collections endpoint return all Collections contained in the STAC API (recursively through the Catalog)? Or only the Collections whose parent is the STAC API?
  • Regardless of the answer above, the /search endpoint will search Collections that are direct children and also the children of any children (recursive). That way a search on the root API can find any Items contained in the catalog as a whole, while searching on sub-APIs (child subcatalogs that have conformance classes/are an API) will only search Items in the Collections that are returned by their own /collections endpoint

Matt suggested scheduling a working session to clarify these points and others, and get this PR over the finish line - he will follow up

@m-mohr
Copy link
Collaborator

m-mohr commented Nov 29, 2021

  • Catalogs in a STAC API should not contain Items (as a best practice)
  • Collections in a STAC API should not contain any children (as a best practice)

I was confused by this in the call and did not pick this up as being agreed on yet. What's the background on this?

Copy link
Collaborator

@cholmes cholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite done with full review, but want to get some of my comments in as not sure when I'll get back to it.

PRINCIPLES.md Outdated Show resolved Hide resolved
core/README.md Outdated
that indicates to clients that this is a STAC API and how to access conformance classes, including this
one. The relevant conformance URI's are listed in each part of the
API specification. If a conformance URI is listed then the service must implement all of the required capabilities.
Whenever a static STAC catalog is served over HTTP, it is a defacto hypermedia-driven web API. Even without implementing any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should define what a 'static STAC catalog' is? Maybe it's just rephrasing this, but as it reads it seems to assume some knowledge of a static STAC. Could also link to the section in stac-spec on static catalogs.

core/README.md Outdated
Whenever a static STAC catalog is served over HTTP, it is a defacto hypermedia-driven web API. Even without implementing any
STAC API conformance classes, the entire catalog can be traversed from the root via `child` and `item` link relations. Support for
this "browse" mode of interaction is complementary to the dynamic search capabilities defined by other STAC API conformance classes.
Conversely, many STAC API implementations do not support browse, even though the root is a Catalog object, because they do not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line and the next seems to be talking about existing STAC API implementations? I think it'd be good to rephrase it more abstractly for the spec. Describe the use case when API's do not support browse, and perhaps in parenthesis explain how many 1.0-beta and early catalogs didn't have the appropriate link relations to traverse. Like the spec should read as the spec, without too much dialog on the existing state of things.

core/README.md Outdated
Conversely, many STAC API implementations do not support browse, even though the root is a Catalog object, because they do not
have the appropriate `child` and `item` link relations to traverse over the objects in the catalog.
Providing users with these two different, complementary ways of navigating the catalog allows them to interrogate the data in whichever
way best meets their needs. Supporting these also opens up a catalog to both
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain the use cases of both of these ways? One for crawling / browsing, one for getting the endpoints to search against.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(could also be a link to a spot that discusses it more)

1. Catalog -> Catalog (product) -> Catalog (date) -> Catalog (path) -> Catalog (row)
2. Catalog -> Catalog (product) -> Catalog (path) -> Catalog (row) -> Catalog (date)

There are many options for how to structure these catalog graphs, so it will take some analysis work to figure out
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be good to have some sort of 'best practices' linked to where options for this are discussed more. It also might be worth a bit of guidance here, just like thinking about it from a users perspective - how people would like to browse to data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed an issue to come back to these #243

I think this is an improvement over what we had before, but recognize there's still a better way to describe these that I can't quite articulate right now

core/README.md Outdated
- child -> /catalogs/sentinel_2_l2a

Since the catalogs structure is a directed acyclic graph which allows
you to provide numerous different Catalog and Collection graphs reach leaf Items. For example, for a Landsat 8 data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to provide some guidance on the final 'item' link from different views. Like should they all link to the exact same item url? Or have different item urls that all have canonical hrefs to the same one? And should the canonical one be in the browse hierarchy, or in the collection?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@lossyrob
Copy link
Collaborator

@m-mohr

Catalogs in a STAC API should not contain Items (as a best practice)
Collections in a STAC API should not contain any children (as a best practice)

I see this is in direct conflict with this declaration of Catalogs as a browsable unit that contains Items. When we were discussing it, I was less thinking about the "browseable" part of APIs, which to me is less interesting. The children extension is interesting for my use case in that it allows the ability to create sub-STAC APIs to organize Collections into a multi-level hierarchy, which solves an issue when you have many Collections in an API, some of them strongly related. If we consider Collections as the "searchable" container vs child Catalogs being "sub-APIs", while a Catalog that is a child of a Landing Page root of a STAC API could have Items, it is confusing to have Items in Catalogs, which can't be searched, and confusing for Collections to contain anything but Items. This is true if we only want to use Catalogs in an API them as an organizational mechanism for Collections.

Collections that contain Catalogs that organize the Items into things like date/path/row would allow for users to click through an organized set of items, enabling the "Browsable" part of all this work - I see that now. This compounds the complexity of using Catalogs for both organizing Collections (and existing as sub-APIs) and also organizing Items (which may or may not be a sub-API, though I'm not sure you'd want that many sub-APIs for groups of Items in the same Collection, though in order to have their own /children endpoint they may need to be a STAC API themselves). So let me retract those points.

To think through something specific and imagine an API that might have both cases:

  • A root STAC API Landing Page/Catalog (which I'll refer to as a STAC) that contains a large number of Collections. These Collections are organized into Catalogs that themselves represent STACs (sub-STACs).
  • The root /collection endpoint of the root STAC returns either all collections, recursive through the tree of sub-STACs, or only the collections that have the root STAC as a parent (not sure what's best here).
  • The root /children endpoint returns all direct children of the STAC, i.e. any Catalog (sub-STAC) or Collection that has the root STAC set as the parent
  • The root /search endpoint will search through all collections contained in the STAC or sub-STAC, recursively.
  • Say you have a sub-STAC called "MODIS" that contains all Collections for products related to modis. This is retrievable through the /children endpoint. It has its self link contained to the STAC endpoint which returns the Landing Page/Catalog for that sub-STAC. Say it's at /children/modis.
  • The /children/modis/collections endpoint will return all the MODIS Collections, who have their parent link set to the MODIS sub-STAC endpoint
  • The /children/modis/children endpoint returns all children of that sub-STAC. In this case, the MODIS sub-stack only contains Collections (and no sub-STACs of its own), so it returns the same as the /children/modis/collections endpoint
  • The /children/modis/search endpoint searches against only the MODIS Collections contained by this sub-STAC
  • Now let's consider a specific MODIS Collection, say /children/modis/collections/MOD14A2. If I call /children/modis/collections/MOD14A2/children ... this is where I'm confused. The Collection isn't a STAC API, so it wouldn't have a children endpoint to 'crawl' through. So it seems like the /children/modis/children would have to contain the Collection's subcatalogs, even though the parent of those Catalogs is actually the Collection?

Perhaps someone can help clear that up for me by continuing that specific example? I was trying to get to the point where the MOD14A2 Collection has subcatalogs that organize the Items into browsable categories, let's say year month day. Then thinking through how that would translate into a front-end experience like STAC Browser, i.e. what endpoints the front end would call when.

README.md Show resolved Hide resolved
@philvarner
Copy link
Collaborator Author

I extracted a bunch of the typo and wordsmithing changes into https://github.com/radiantearth/stac-api-spec/pull/234/files

@philvarner
Copy link
Collaborator Author

One other issue to consider is that OGC uses the term "crawlable", which I think is synonymous with our use of "browseable", so we should consider adopting that term instead.

@philvarner
Copy link
Collaborator Author

The top level README (https://github.com/philvarner/stac-api-spec/tree/catalogs-and-browseable#in-this-repository) should be updated to link to the children folder w/ description.

Thanks @geospatial-jeff -- I noticed the same and added it

Copy link
Collaborator

@cholmes cholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I added a few commitable suggestions, but feel free to tweak them. And one or two other suggestions that are not 'must have', so I'm approving this.

I was wondering why you ended up with 'browseable', as I thought you were leaning towards 'crawlable' like OGC. I'm happy with it as is, just curious.

browseable/README.md Outdated Show resolved Hide resolved
browseable/README.md Outdated Show resolved Hide resolved
This JSON is what would be expected from an API that implements `STAC API - Browseable`.

This particular catalog provides both the ability to browse down to child Catalog objects through its
`child` links, which then will eventually reach Items through `item` link relations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example below doesn't seem particularly meaningful. Could be good to try to illustrate the point more, since usually the example helps show that. Perhaps a little diagram, that shows a non-browsable catalog vs a browsable one? Or perhaps just explain after the example that a catalog not implementing browsable would not have the 'child' links, but it would have them in the 'data' rel link. I think it'd be good to just provide more context, and to help make it clear to existing implementations what they need to do.

children/README.md Show resolved Hide resolved
core/README.md Outdated Show resolved Hide resolved
core/README.md Outdated Show resolved Hide resolved
core/README.md Outdated Show resolved Hide resolved
core/README.md Outdated Show resolved Hide resolved
@matthewhanson
Copy link
Collaborator

This looks great, approved, I just left a comment regarding the use of the word "shall".

Copy link
Collaborator

@lossyrob lossyrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments related to using catalogs/ instead of collections/ for hosting sub-catalogs for Item groupings for the Hierarchy recommendations. However I don't think these comments are blocking.

💯 well done!

browseable/README.md Outdated Show resolved Hide resolved
core/README.md Outdated Show resolved Hide resolved

| Endpoint | Returns | Description |
| --------------------- | ---------------------------------------------- | -------------------- |
| `/catalogs/{catalogId}` | [Catalog](../stac-spec/catalog-spec/README.md) | child Catalog object |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous question on the example using /catalogs/landsats-8-c1 is answered here. I see that it's a recommendation to avoid conflicts, though I think implementations could work around that and it would provide a cleaner set of URLs to avoid both a Catalog and Collection with the same name. Actually I was surprised to see that Catalog.id and Collection.id don't at least recommend there isn't a conflict inside the same root, as having duplicate IDs for Collections and Catalogs might get confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a pretty reasonable recommendation to make. My only hesitancy in making it is that I feel like Catalog and Collections (as we currently define them) are not related in a way in which there would ever be confusion between them. If we'd defined a Collection as-a Catalog (maybe with the additional restriction that only one Collection can exist in a hierarchy?), then I could see a good case for not duplicating ids.

Copy link

@geospatial-jeff geospatial-jeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @philvarner!

@philvarner
Copy link
Collaborator Author

I was wondering why you ended up with 'browseable', as I thought you were leaning towards 'crawlable' like OGC. I'm happy with it as is, just curious.

My understanding was that the semantics of this may be slightly different than whatever crawlable ends up meaning in OGC, and that crawlable might not even be the term they end up using. Given that uncertainty, if we go with our own terminology, we can easily align it to theirs when then work is finalized, whereas it would be very confusing if we had something that was named the same but different.

@philvarner philvarner merged commit 4f0ced2 into radiantearth:dev Jan 5, 2022
@philvarner philvarner deleted the catalogs-and-browseable branch January 5, 2022 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants