Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a developer, I want to optimize the retrieval of feature groups #273

Open
epag opened this issue Aug 21, 2024 · 3 comments
Open

As a developer, I want to optimize the retrieval of feature groups #273

epag opened this issue Aug 21, 2024 · 3 comments

Comments

@epag
Copy link
Collaborator

epag commented Aug 21, 2024


Author Name: James (James)
Original Redmine Issue: 95971, https://vlab.noaa.gov/redmine/issues/95971
Original Date: 2021-09-08


Given an evaluation that contains multiple instances of the same feature in different contexts (e.g., feature groups)
When that evaluation proceeds
Then it should not retrieve the same time-series data more than once from an underlying data store


Redmine related issue(s): 110326


@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-09-08T14:01:20Z


Not a priority, more like a nice-to-have.

For example, if an evaluation contains feature groups (A), (A,B), (A,B,C), (B,C), then B should not be retrieved more than once, likewise A and C.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-09-08T14:04:07Z


Caching of retrievals in general would not be a good solution to this (too much memory). A better solution would be to de-duplicate retrievals on pool creation, linking together suppliers of the atomic pools, something like that. Effectively, we do not want to hold in memory more of the time-series data than exactly those time-series that will be required in more than one context.

Will need some work around the @poolfactory@ to achieve de-duplication.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-10-06T16:07:12Z


I don't think this needs to happen before feature pooling is deployed, it's pure optimization and it might even be premature optimization in the sense that combinations of singleton features and feature groups that overlap might not even be a common thing, in practice.

Bottom line, without this optimization, you can expect to see overlapping retrievals and hence more effort than the minimum needed whenever a feature is referenced more than once in a @FeatureGroup@ context or in both a singleton feature and a @FeatureGroup@ context. No big deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant