Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best Practices: Catalog Layout -- Catalog vs Collection as root document #925

Closed
CloudNiner opened this issue Dec 8, 2020 · 9 comments · Fixed by #932
Closed

Best Practices: Catalog Layout -- Catalog vs Collection as root document #925

CloudNiner opened this issue Dec 8, 2020 · 9 comments · Fixed by #932
Assignees
Labels
discussion needed prio: must-have required for release associated with
Milestone

Comments

@CloudNiner
Copy link
Contributor

I'm reading Best Practices - Catalog Layout for some guidance on how to structure my catalog which is a Collection containing Items as direct children.

Of particular interest:

  1. Root documents (catalogs / collections) should be at the root of a directory tree containing the static catalog.
  2. Catalogs should be named catalog.json (cf. index.html).
  3. Collections that are distinct from catalogs should be named collection.json.

When writing a static catalog for this use case, based on the best practices quoted above, I see a few options:

// #1
catalog.json
  - collection.json
    - item.json
    - item.json

// #2 
collection.json
  - item.json
  - item.json

// #3 
catalog.json // <-- but this is actually a Collection
  - item.json
  - item.json

I'm inclined to choose door 2, because it doesn't duplicate objects or require unnecessary structure, after all a Collection is a Catalog and best practice 1 implies a collection can be a root document. Door 2 is also what I get if, in PySTAC, I create a Collection, add some Items to it, and export it. The door 2 structure also validates in PySTAC. However, it looks like other published catalogs such as https://landsat-stac.s3.amazonaws.com/catalog.json use door 1. Door 3 came about because best practice 2 references the established convention of index.html which makes me think that root objects should always be named catalog.json regardless of whether they contain a Catalog or Collection.

Is there already a generally accepted best practice for this situation?

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 8, 2020

I guess the authors meant door 3, but I'm more inclined to door 2 tbh. @cholmes
Let's capture what people do and favor and make the best practice more precise!

@cholmes
Copy link
Contributor

cholmes commented Dec 11, 2020

I think I like door 2 too. I don't think I wrote this. But yeah, let's capture what people do. Though if PySTAC is doing it then it's probably what many people are doing. I upgraded my mini-STAC planet catalog with stactools / pystac last time, and yep, looks like I'm now doing option #2 as well.

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 11, 2020

I don't think I wrote this.

git blame is evil and knows the truth ;-)

CloudNiner added a commit to CloudNiner/stac-spec that referenced this issue Dec 11, 2020
Here's an attempt to simplify and clarify the best practices 
section for catalog layout to address radiantearth#925.

In practice, we've found that every Catalog is named catalog.json
and every Collection is named collection.json, which is enforced
by tools such as PySTAC.
@CloudNiner
Copy link
Contributor Author

Great! I opened #932 with an initial attempt at a wording change to this section. Feel free to take it or leave it, this type of change is subject to interpretation!

@cholmes
Copy link
Contributor

cholmes commented Dec 11, 2020

I don't think I wrote this.

git blame is evil and knows the truth ;-)

I mean, I may have entered it on github. But I think it was a copy and paste from something Seth wrote up.

@fredliporace
Copy link
Contributor

fredliporace commented Dec 11, 2020

Guess the idea was to force the more specific name collection.json when possible...

Every Collection is Catalog, so stating simply to use catalog.json or collection.json would be equivalent to: "Animals should be named animals.json: Cats should be named cats.json", which is ambiguous since it would give two possible names for cats...

Maybe use this:"Catalogs which are not collections should be named catalog.json. Collections should be named collection.json"

CloudNiner added a commit to CloudNiner/stac-spec that referenced this issue Dec 12, 2020
Based on feedback in radiantearth#925
@CloudNiner
Copy link
Contributor Author

Maybe use this:"Catalogs which are not collections should be named catalog.json. Collections should be named collection.json"

I like that. Updated #932

@m-mohr m-mohr added this to the 1.0.0 milestone Jan 4, 2021
@m-mohr
Copy link
Collaborator

m-mohr commented Jan 5, 2021

I found a usecase for specific collection.json name in STAC Index. Knowing it's a collection from just the name helps to prioritize collections over catalogs in the crawling process. So +1 for collection.json.

@m-mohr m-mohr linked a pull request Jan 5, 2021 that will close this issue
4 tasks
@m-mohr m-mohr modified the milestones: 1.0.0, 1.0.0-beta.3 Jan 5, 2021
@cholmes cholmes added the prio: must-have required for release associated with label Jan 5, 2021
@cholmes cholmes closed this as completed Jan 15, 2021
@m-mohr m-mohr reopened this Jan 24, 2021
@m-mohr
Copy link
Collaborator

m-mohr commented Jan 24, 2021

I'm re-opening this as I've figured out why it was like it was before and I think we should at least discuss it again. The last days I looked into how STAC Browser handles the Breadcrumb generation and URL generation. In fact, I think the best practice was written for STAC Browser before. Having a catalog.json in each folder allows to generate a structure and navigate to the parent without any further details. If there'a a distinction between catalog.json and collection.json this is not true any longer. This is why we had door #3 before and I think @cholmes actually was right that he added it to the spec, but the person giving the idea was Seth as former STAC Browser maintainer. With the recent change to the best practice nice slug generation for STAC Browser is nearly impossible without reading all parent catalogs as I can't rely on a given best practice. Having that said, not many catalogs actually follow the old best practice, neither does PySTAC. Thus solving the issue in STAC Browser will get complicated anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed prio: must-have required for release associated with
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants