Browsable APIs #159

matthewhanson · 2021-06-21T02:32:17Z

There's been a few implementations of browsable APIs that allow browsing of Items within a Collection. staccato implemented this at one point, CMR-STAC allows browsing of sub-catalogs in a Collection by year/month/day. There's never been any completed proposal for this for stac-api-spec. It's conformant within stac-spec itself, because it mimics the structure of a static catalog.

In a similar fashion, @lossyrob and I were recently discussing a browsable API endpoint for the purpose of grouping together related collections. For example, MODIS has a variety of products, how can we group those together in a catalog within an API?

I'm not sure yet if either of these use cases actually require an addition (extension) to stac-api-spec, because it really is just an implementation of a "static" catalog (albeit in a programmatic dynamic way), where any catalog within the entire structure could contain the proper conformance classes and search links and be an independent API. This can be seen in CMR-STAC where the root catalog is actually just a static catalog that has a bunch of children...each of the children is it's own STAC API for a single NASA EOSDIS provider.

Thought it might be worthwhile to discuss some implementation details to work out if anything is required on the spec side. If you have worked on an implementation like this, or have some ideas about how you might implement it, please weigh in.

For case # 1 - Browsable catalogs for Items within a collection

CMR-STAC uses CMR's aggregation endpoint to dynamically create child catalogs by year, then month, then day: ex: https://cmr.earthdata.nasa.gov/stac/USGS_EROS/collections/ETM.v1/2003/01/02 which has a child that points to the item https://cmr.earthdata.nasa.gov/stac/USGS_EROS/collections/ETM.v1/items/G162894313-USGS_EROS
IIRC staccato used a dynamic approach where you could specify via URL parameters fields to create sub-catalogs of, e.g., /collections/<cid>/path/087/row/435. This doesn't provide child catalogs in the same sense, but such links could easy be added in at the collection level for some default fields to browse by. @joshfix

For case #2 - Browsable catalogs for grouping collections

CMR-STAC does this, but only by data providers, which is a concept already in CMR.
My thought was that you could have your root catalog, which would be a normal STAC API endpoint for searching your entire archive. For "child" endpoints you would group the collections dynamically via some existing parameter or a new one added via an extension (note this implies collection search is added to the spec).
The child catalogs could in turn implement their own API endpoint. A /search call on this endpoint would automatically be limited to the collections that are within the child catalog. The catalog would contain child links to each of the member collections.

The text was updated successfully, but these errors were encountered:

cholmes · 2021-06-24T20:57:34Z

I've always been a big fan of this, and I do think it's time for us to do 'something' about it. We evolved to a place where it's all 'acceptable' in a STAC API, but it's still pretty confusing to both implementors and users what you should do. Even in the base case, should you have 'child' links to each collection and a 'data' link to /collections? Or just do one of them?

I am thinking one or more new conformance classes could help. The way I've been thinking about it is to have a 'browse' conformance class, and that would say that your root document uses child links that navigate all the way down to actual items (perhaps have it require that you can get to every item with browse). If you don't implement 'browse' conformance then you should just use the data link to the collections/ endpoint. So it would really be saying: a client built just for STAC core can treat this exactly as if it's a static catalog.

I don't think that maps exactly to your two use cases Matt, but we should talk through all the options, and also look at what all the various catalogs have done, and come up with the conformance classes and recommendations that lead to very clear guidelines for what implementors should do.

lossyrob · 2021-07-19T16:18:45Z

Currently we're doing something similar ad-hoc for our STAC API - in the current version on the data catalog page (https://planetarycomputer.microsoft.com/catalog) you can see "Daymet Collection", which is actually a set of Collections that are all from the Daymet dataset. We did this to avoid cluttering the UI a set of related collections that deserve their own page. If you click on the Daymet collection, it brings you to a page that just shows Daymet collections.

Our current implementation requires some configuration on the front-end that exists out-of-band of the STAC API. We're hoping this Browsable API extension can solve this issue while keeping all of the data encoded in STAC data.

One idea is that this enables API Landing Pages (which are themselves Catalogs) to contain links to both Collections and Catalogs (whereas they only link to Collections right now, at least as part of the spec). The catalog links would be links to sub-APIs, where each of the catalogs itself was a Land Page to a STAC API endpoint to those subset of collections (e.g. a Daymet-only API with /search, /collections, etc)

An additional endpoint could be made to the API, /catalogs, that would mimic what /collections does, but for these sub-APIs. This is important as you need to fetch responses for each link to see if it's a Catalog or Collection; it's much easier to do 2 calls to the /collections and /catalogs endpoints get all the information partitioned by type.

One question that arose is, what should the top level /collections return - should it be only Collections that are not part of sub-APIs, or all Collections? I'd say that it should return all Collections, as this matches the current behavior. Those Collections would have a parent link that was not the link to the root Landing Page, and so users can differentiate between Collections that do not belong to a sub-API, and those that do. In our case, the front end would fetch all Catalogs and Collections, and then filter Collections to only show those that have a parent link to the root Landing Page (if we wanted to hide Collections belonging to sub-APIs, which there may be situations in which we don't).

This is our need for case # 2. I don't think we actually need a solution to Case # 1, though it does seem useful to implement...but this makes me think these are potentially 2 different extensions?

mmcfarland · 2021-07-19T17:31:33Z

On the question of whether /collections would continue to return all collections, I agree it makes sense to have consistent behavior where you'd expect to get all Collections available in the API. However, as part of this use case implies a grouping scheme to manage large sets of Collections, having a mechanism to limit Collections to just those of a single Catalog would be important. That is, if /collections continues to include all Collections, we would still want an endpoint to get just "root-Catalog" Collections and not fetch Daymet, MODIS, etc collections that an application doesn't need to account for at the top level (and in our case is likely to be many dozens of Collections)

In the example @lossyrob provides, ideally we would want to fetch from 2 endpoints for a top level catalog-app:

A root-Catalog /collections endpoint: all Collections not in a sub-Catalog
/catalogs - list of sub Catalogs, which provide necessary Catalog metadata

Assuming the result of /catalogs contains full Catalog objects with title, description and an thumbnail asset (similar to existing /collections response), then we could render our full top-level catalog with only 2 requests, regardless of how many sub-Catalogs the API provided. Sub-Catalog pages would just make one additional request to get those related Collections when navigated to.

lossyrob · 2021-07-19T18:51:03Z

Good points. I'd suggest that the /collections endpoint keep the behavior of returning all Collections in the root, including those that belong to subcatalogs, so that the extension doesn't modify the baseline behavior. But to hit the usecase you mention, perhaps the extension could add a query parameter to /collecitons, e.g. /collections?rootOnly=True (or something) so that users can perform a fetch of all Collections not in a sub-Catalog. That way we don't need to create a new endpoint, but have the non-default behavior easily accessible to clients that understand the extension.

matthewhanson · 2021-07-30T17:37:46Z

I agree, /collections should return all collections that are (recursively) underneath any given catalog.

I like the idea of any catalog being able to serve as its own API, basically providing additional filters to queries. I think this is basically what we talked about a few months ago.

These are clearly different use cases, not sure if they are different extensions or not, it's basically just introducing sub-catalogs at any point, either before collections or after so I think implementation wise they are same. If you wanted you could also have sub-catalogs after Collections that also potentially are APIs as well, automatically turning the browse parameters (e.g., path/row, year/mon/day) into query params.

If using PySTAC, you can just use get_collections at the root and will only return direct collections.
However in pystac-client I redefined this behavior so that if it's a conforming API it will use the /collections endpoint to get all the collections from one request rather than resolving each collection URL. However, if grouping like you are that's slightly different than expected behavior in PySTAC. So perhaps better if I leave get_collections as it is where it only gets direct child collections and instead redefine get_all_collections to use /collections if available

cholmes mentioned this issue Jun 24, 2021

Clarify how rel=items is implemented in the STAC API spec #161

Closed

TomAugspurger mentioned this issue Aug 2, 2021

STAC catalog sprint: to-do items pangeo-forge/pangeo-forge-catalog#1

Open

6 tasks

philvarner added this to the 1.0.0-beta.5 milestone Sep 17, 2021

m-mohr mentioned this issue Oct 22, 2021

Collection names are duplicated radiantearth/stac-browser#103

Closed

philvarner mentioned this issue Nov 8, 2021

Browseable recommendations and Children conformance class #229

Merged

4 tasks

philvarner closed this as completed Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Browsable APIs #159

Browsable APIs #159

matthewhanson commented Jun 21, 2021 •

edited

Loading

cholmes commented Jun 24, 2021

lossyrob commented Jul 19, 2021

mmcfarland commented Jul 19, 2021

lossyrob commented Jul 19, 2021

matthewhanson commented Jul 30, 2021

Browsable APIs #159

Browsable APIs #159

Comments

matthewhanson commented Jun 21, 2021 • edited Loading

cholmes commented Jun 24, 2021

lossyrob commented Jul 19, 2021

mmcfarland commented Jul 19, 2021

lossyrob commented Jul 19, 2021

matthewhanson commented Jul 30, 2021

matthewhanson commented Jun 21, 2021 •

edited

Loading