getContent() returns the information we index for feature layers #322

tomwayson · 2020-08-17T19:26:18Z

Define IHubDataset that extends IHubContent w properties like recordCount, fields, geometryType, layer?, server?, etc, as needed by the Hub app, and have getContent() return that when the content's hubType is dataset.

The text was updated successfully, but these errors were encountered:

ajturner · 2020-08-17T19:59:04Z

Should this method provide an option to include or not include these additional metadata? When these extended properties are not cached via harvesting (e.g. public datasets) then there could be significant XHR to fetch these from different endpoints.

Relatedly, should the extended properties be in a type-specific property to make it clear they're type specific, maybe optional, and keep the primary object readable/coherent. For example dataset.extended.fields

tomwayson · 2020-08-28T22:01:16Z

Some things we discussed offline:

we won't used a property bag like content.extended - we'll just add the properties directly to content (i.e. IHubDataset extends IHubContent)
we'll use an request option like include (or exclude, depending on what the default behavior is)
should default behavior be to perform whatever additional async fetches are needed to return a complete dataset and users would have to opt-out, or to not perform any (opt-in)?

ajturner · 2020-08-29T17:17:42Z

I’d like to hear @mikeringrose thoughts on default includes.

I recall seeing an update to Urban GraphQL API only default return an ID unless fields were explicitly include.

However, that would be onerous request to include all common Resources or Content metadata. We don’t want to require knowledge of the data store implementation (what’s in Portal vs Server) - but is it sensible to say “by default the model includes [...] attributes”

tomwayson · 2020-09-01T14:43:02Z

To me this option is not going to specify which fields are returned, but rather which additional xhrs are going to be made if needed (i.e. the content is coming from the portal API instead of the Hub API). I don't think we should derive the list of xhrs from a list of fields (i.e. "consumer wants advancedQueryCapabilities, therefore I need to fetch the layer info"). In other words getContent() should always return all the properties it has fetched, and what the consumer will have control over is what additional fetches should be made if needed.

To me that implies a better name than include would be ensure. For example, if a consumer specifies ensure: ['service', 'recordCount']:

when fetching from portal (i.e. private item, or in enterprise), additional xhrs would be made to get the service info (fields, advancedQueryCapabilities, maxRecordCount, etc) and query the layer's recordCount, and the corresponding properties would be populated
when fetching from the Hub API, no additional xhrs are made, the above properties would be populated b/c they're returned from the API. Also, properties like statistics and server are populated b/c they are returned by the API too.

tomwayson · 2020-09-01T21:03:55Z

Also, there are some enrichments like statistics that we do not pro-actively fetch client-side. In other words, when you visit a the dataset route in enterprise or for a private item, we do not go and fetch the stats for every field before rendering the template.

This makes me wonder if ought to just provide additional async functions like getRecordCount(), getLayerInfo() getFieldStatistics()etc, for clients to fetch the additional data as needed, and perhaps some fns to enrich the content object w/ that data (i.e. setContentLayerInfo(content, layerInfo)).

At least we can start w/ those fns, and decide later if getContent() needs to take an option that would allow consumers to indicate that they want those fetched up front.

ajturner · 2020-09-01T23:44:34Z

@tomwayson I really dislike and want to actively avoid adding "GIS infrastructure implementation" to these methods. What the heck is LayerInfo or service?

Maybe a middle ground is aligning these external metadata sources with their semantic and logical utility. For example, can getCapabilities() wrap up the various layer info, etc (still jargony, but a little more meaningful)?

I agree that we can provide these individual focused request methods for external use; but the primary use case would be the single getContent(..) that would internally also use these methods depending on requested metadata.

For include vs. ensure I think that's just an implementation detail difference. If the database is normalized or denormalized is irrelevant to the interface. include is a more common term in this scenario (and less a digestive health beverage).

For getField Statistics() (or getStatistics() to be more comprehensive that could return record count?) this should similarly allow for an explicit array of which attributes to include. There may be many times the UI only needs a single attribute, or maybe a few but rarely all of them.

bcamper · 2020-09-01T23:59:13Z

statistics is a good example where it's not always practical to fetch all results at once. Looking through the approach and lessons learned from hub-indexer (relevant files in https://github.com/ArcGIS/hub-indexer/tree/master/packages/duke/enrich): for numeric and date fields we're doing a single stats call that batches all fields + stats of that field type, but then one call per field for string fields (explanation here, and Peter and I ran into the same issue in filter prototyping), e.g.

1 call for all numerics
1 call for all dates
N calls for N strings

Further, the string stats calls are made sequentially (I assume to avoid overwhelming the server, perhaps someone remembers). So whether it's an includes option or @tomwayson's decorator function, there would be cases where this process can take several seconds and it's inadvisable for UI. Product-wise, we can work around this as needed. For example, for the filtering UI, for the list of attributes, we will probably just degrade and show less field metadata when cached stats are not available. But when a specific field is then selected and rendered as a filter UI, we can request the stats for that single field (to render a slider, histogram, checkboxes, etc.).

In any case, as part of the filtering work, it would be great to have @JoshCrozier involved in this, and consider extracting some of the relevant hub-indexer stats code linked above into hub.js methods for use on the client as well.

tomwayson · 2020-09-02T16:19:30Z

want to actively avoid adding "GIS infrastructure implementation" to these methods

I think that's just an implementation detail difference (naming things)

can getCapabilities() wrap up the various layer info, etc (still jargony, but a little more meaningful)?

or less meaningful depending on the audience, for example myself.

sounds like we're in agreement on these ideas:

if we've got the data, return it (i.e. if we fetched from the Hub API and properties like fields, etc exist)
provide async fns to fetch the additional props as needed
provide an API (via options) to have getContent() proactively fetch those things if they weren't returned (i.e. by the Hub API).

sounds like we're not exactly in agreement in what to name those fns and options but that's software

we can provide these individual focused request methods for external use; but the primary use case would be the single getContent(..)

I agree, but that's the higher-order, and harder use case. I suggest we start by building the stand-alone fns themselves. That will help inform us on what the getContent() option should look like.

Specifically, I suggest we start w/ #353, which is not specific to datasets, but should help illuminate the pattern. Then we can build into getCapabilities(). While I'm working on that @JoshCrozier can start to come at it from the statistics side of things.

tomwayson · 2020-10-14T19:36:54Z

In #399 I set up the infrastructure to make additional fetches for content properties and gracefully capture any errors. We can use that for the additional properties needed for datasets.

We'll use #401 to flesh out:

making the right additional fetches based on the item type (i.e. web maps fetch the item data, map services fetch the server info, etc)

We'll use #538 to flesh out:

providing an API to opt out of making those fetches

I've scoped this issue down to focus on returning what is needed for a feature layer. We can add additional issues for the other types of content that hub considers a dataset.

tomwayson · 2021-06-10T23:08:19Z

We should reference the functions outlined https://devtopia.esri.com/dc/hub/issues/1187 to understand what composer is doing server-side.

tomwayson added this to the Content views MVP milestone Aug 28, 2020

tomwayson mentioned this issue Aug 31, 2020

fix(getContent): fix handle un-indexed (usually private) content #345

Merged

tomwayson mentioned this issue Sep 2, 2020

IHubContent should have metadata, and provide a way to fetch it #353

Closed

This was referenced Oct 12, 2020

feat(getContent): return item group ids #399

Merged

getContent() handles web maps/scenes #401

Closed

tomwayson modified the milestones: Content views MVP, Support the Data Content View Oct 14, 2020

tomwayson changed the title ~~getContent() returns IHubDataset for dataset types~~ getContent() returns the information we index for feature layers Oct 14, 2020

This was referenced Jun 9, 2021

getContent() allows callers to only fetch specific enrichments #538

Closed

allow consumers to control what enrichments are fetched #540

Merged

tomwayson mentioned this issue Jun 15, 2021

Content family and org #545

Merged

tomwayson mentioned this issue Mar 16, 2022

feat(hub-common): add viewDefinition and recordCount to fetchContent(… #744

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getContent() returns the information we index for feature layers #322

getContent() returns the information we index for feature layers #322

tomwayson commented Aug 17, 2020 •

edited

Loading

ajturner commented Aug 17, 2020

tomwayson commented Aug 28, 2020

ajturner commented Aug 29, 2020

tomwayson commented Sep 1, 2020

tomwayson commented Sep 1, 2020

ajturner commented Sep 1, 2020

bcamper commented Sep 1, 2020

tomwayson commented Sep 2, 2020 •

edited

Loading

tomwayson commented Oct 14, 2020 •

edited

Loading

tomwayson commented Jun 10, 2021

getContent() returns the information we index for feature layers #322

getContent() returns the information we index for feature layers #322

Comments

tomwayson commented Aug 17, 2020 • edited Loading

ajturner commented Aug 17, 2020

tomwayson commented Aug 28, 2020

ajturner commented Aug 29, 2020

tomwayson commented Sep 1, 2020

tomwayson commented Sep 1, 2020

ajturner commented Sep 1, 2020

bcamper commented Sep 1, 2020

tomwayson commented Sep 2, 2020 • edited Loading

tomwayson commented Oct 14, 2020 • edited Loading

tomwayson commented Jun 10, 2021

tomwayson commented Aug 17, 2020 •

edited

Loading

tomwayson commented Sep 2, 2020 •

edited

Loading

tomwayson commented Oct 14, 2020 •

edited

Loading