Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol Buffer version of STAC for use with gRPC #575

Open
davidraleigh opened this issue Aug 28, 2019 · 5 comments
Open

Protocol Buffer version of STAC for use with gRPC #575

davidraleigh opened this issue Aug 28, 2019 · 5 comments

Comments

@davidraleigh
Copy link
Contributor

Submitting an issue to ask for input on the included Protocol Buffer definitions that attemp to match JSON STAC. If some of you all could review the below tables and give input as to whether this is an acceptable STAC-like implementation, that would be great. I'd love to eventually fold Protobuf STAC into the stac-spec or have it be a community accepted project. Please, let me know what I can do to make that happen.

Why gRPC and Protobufs? gRPC is a high performance micro-service RPC framework that allows bi-directional streaming and uses compact a data formats. Protobuf is the standard compact message data format for gRPC. Protobuf and gRPC are open source Cloud Native Computing Foundation projects. They are originally open sourced by Google and used since 2003. At this time Google executes 10s of billions of RPC messages a second with gRPC and Protobuf, so you can rest assured it's stable.

The repo that holds the proto IDL files and their generated code is here:
https://github.com/geo-grpc/api
Some documentation generated from the proto files can be found here:
https://geo-grpc.github.io/api/#epl%2fprotobuf%2fstac.proto
A Python Client can be found here:
https://github.com/nearspacelabs/stac-client-python

There are some limitations about how you define Protocol Buffers that prevents a one-to-one match of STAC. Please look at the tables for differences and the lists of explanations beneath each table. The most significant departure is that Properties would be reserved for user defined data that is outside of the STAC specification, and the data defined by the STAC specification would exist directly on the StacItem definition. Other differences include use of a GeometryData protobuf and a preference for enums wherever possible.

STAC Item Comparison

For Comparison, here is the JSON STAC item field summary and the Protobuf STAC item field summary. Below is a table comparing the two:

Field Name STAC Protobuf Type STAC JSON Type
id string string
type NA string
geometry GeometryData GeoJSON Geometry Object
bbox EnvelopeData [number]
properties google.protobuf.Any Properties Object
links NA [Link Object]
assets StacItem.AssetsEntry Map
collection string string
title string Inside Properties
datetime google.protobuf.Timestamp Inside Properties
observation google.protobuf.Timestamp Inside Properties
processed google.protobuf.Timestamp Inside Properties
updated google.protobuf.Timestamp Inside Properties
duration google.protobuf.Duration Inside Properties
eo Eo Inside Properties
sar Sar Inside Properties
landsat Landsat Inside Properties

List of Item Spec differences and explanations:

  • type field isn't implemented as this is GeoJSON specific
  • geometry field uses GeometryData instead of GeoJSON. This choice is about message size and about using other projections besides WGS84
    • The GeometryData protobuf container for geometry information allows the user to define the geometry vertex information using wkt, wkb, esrishape or geojson['geometry'].
    • GeometryData has a SpatialReferenceData field that allows us to define the projection
    • SimpleState field that gives insight as to whether the geometry is broken, known fixed or otherwise
  • bbox field is defined using EnvelopeData
    • explicit definition of the coordinates using xmin, xmax, etc
    • EnvelopeData has a SpatialReferenceData field that allows us to define the projection
  • properties This is the trickiest departure from JSON. As described above, properties would be reserved for user defined data that is outside of the STAC specification. This would be done using the google.protobuf.Any for packing non-specification data into the message.
  • links This could be implemented. We just haven't found it useful with protobuf and gRPC at this time. (there are no links in gRPC, just Remote Procedure Calls)
  • assets Is implemented using the proto3 map and shows up in documentation as StacItem.AssetsEntry
  • datetime is defined using the google.protobuf.Timestamp field. This follows the recommendation from Google for matching JSON
  • observed, processed, and updated are all fields outside of the STAC specification, but I've found them enormously useful. In our case, our StacItem objects duplicate the observed Timestamp value in the datetime field in order to stay compliant with STAC. Maybe we need to make an additional issue requesting these be optional reserved fields for the item-spec.
  • eo and sar, here is the other effect of Protobuf not being a flexible hash map like JSON. Since properties has to be reserved for user defined Protobuf definitions, the extensions like eo, sar, datetime_range, etc, must be defined within the Protobuf StacItem spec. An unused message accounts for only a byte in the total message size, so including these definitions at the StacItem level don't cause any type of memory bloat. Deciding when an extension should be included might be tricky.
  • landsat there isn't a landsat extension yet, so maybe this should be a separate issue.

Eo Comparison

For Comparison, here is the JSON STAC Electro Optical field summary and the Protobuf STAC Electro Optical field summary. Below is a table comparing the two:

JSON Field Name JSON Data Type Protobuf Field Name Protobuf Data Type
eo:gsd number gsd google.protobuf.wrappers.FloatValue
eo:platform string platform Eo.Platform
eo:instrument string instrument Eo.Instrument
eo:constellation string constellation Eo.Constellation
eo:bands [Band Object] bands Eo.Band
eo:epsg integer epsg uint32
eo:cloud_cover number cloud_cover google.protobuf.wrappers.FloatValue
eo:off_nadir number off_nadir google.protobuf.wrappers.FloatValue
eo:azimuth number azimuth google.protobuf.wrappers.FloatValue
eo:sun_azimuth number sun_azimuth google.protobuf.wrappers.FloatValue
eo:sun_elevation number sun_elevation google.protobuf.wrappers.FloatValue

List of Eo Spec differences and explanations:

  • gsd, cloud_cover, off_nadir, azimuth, sun_azimuth and sun_elevation, (ie all fields that have a JSON data type of number) are all set to type google.protobuf.wrappers.FloatValue in Protobuf.
    • The Protocl Buffer definition proto3 is like a C struct when it comes to integer and float fields. This means, they're automatically set to 0 upon object creation and there is no way to know if they were set to 0 or merely constructed with a 0 value.
    • Google's solution for this is wrappers. These object allow you to know if something has been set. It's a little clumsy, but that's how it works with proto3 (proto2 is a different story, but has it's own drawbacks).
  • platform, instrument and constellation are all string in JSON Data Type definition, but in Protobuf they're defined as enums. This might be problematic for someone wanting to put their own instrument data in the Eo object. The reason for choosing Enums is to avoid conflicts and confusions that come about from string definitions. It also allows for more compact storage and querying in database.
  • bands at this time we don't have a working implementation that uses the bands field

Asset Comparison

Field Name JSON Data Type Protobuf Data Type
href string string
title string NA
type string string
eo_bands NA Eo.Band
asset_type NA AssetType
cloud_platform NA CloudPlatform
bucket_manager NA string
bucket_region NA string
bucket NA string
object_path NA string
requester_pays NA bool
  • title isn't used in any of our implementations
  • asset_type is an enum alternative to the string based type field. We prefer enums to strings in all possible cases.
  • cloud_platform and bucket_region are useful for minimizing egress costs
  • bucket and object_path are useful for some of our streaming and FUSE cases.
  • requester_pays is an important piece of information for managing costs.

Catalogs and some other features of STAC have not been implemented. The query language for gRPC can be seen in the [https://geo-grpc.github.io/api/#epl.protobuf.StacRequest] overview. Examples of queries can be seen in the python client: https://github.com/nearspacelabs/stac-client-python#queries

Thank you for reading some or all of this!

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 28, 2019

This seems like great work, although I'm not very familiar with gRPC.
First thing I'd think you could do to promote it is to add it (via PR) to the Third Party Vendor Extensions table in the Extensions README file. You could also add it (via PR) to the implementations page.

Then I have a question regarding the Protobuf Field Names of the extensions: We have an EO and SAR extension. Both have a field bands with a different definition. The Protobuf Field Name is bands. Would it be a problem if it's name is also bands for SAR? Or would it make more sense to rename the Protobuf Field Names to include the prefix? For example, eo_bands and sar_bands? Bands is just an example here, I think there's more overlap (although the definitions don't diverge so much).

@davidraleigh
Copy link
Contributor Author

Thank @m-mohr!

After FOSS4G I'll make a PR to add it to the Third Party Vendor Extensions and another PR to add it to the implementations page.

With the current protobuf definition I've submitted we have the eo and sar fields directly on the StacItem and that helps mimic the JSON-LD name definitions. Accessing bands, platform, instrument, and constellation would look like the follow for eo and for sar:

c_eo = stac_item.eo.constellation
c_sar = stac_item.sar.constellation
b_eo = stac_item.eo.bands
b_sar = stac_item.eo.bands

If we changed the names to the underscore it would look a bit redundant as you can see below:

c_eo = stac_item.eo.eo_constellation
c_sar = stac_item.sar.sar_constellation
b_eo = stac_item.eo.eo_bands
b_sar = stac_item.eo.sar_bands

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 29, 2019

Thank you for clarifying, I didn't catch that there's the scoping between the two dots. Now your approach makes much sense.

@cholmes
Copy link
Contributor

cholmes commented Aug 29, 2019

+1 - great work @davidraleigh. And definitely agree that adding it to Extensions makes good sense, starting as a third party extension and could evolve to be included in the STAC extension repo (we have some work to figure out exactly how we sort out where we put and group extensions).

@simonff
Copy link

simonff commented Nov 13, 2019

/sub

@davidraleigh davidraleigh mentioned this issue Nov 19, 2019
2 tasks
@m-mohr m-mohr added this to the new extensions milestone Mar 11, 2021
@m-mohr m-mohr removed this from the new extensions milestone Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants