Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further results cache optimizations to support PowerAnalytics #1079

Merged
merged 2 commits into from
Mar 6, 2024

Conversation

GabrielKS
Copy link
Contributor

Two main things here:

  1. load_results! was opening a store even if all the requested results were already cached; opening the store is now left to _read_results after it decides whether it needs to do so.
  2. read_results_with_keys is slowish even if the results are already cached because a lot of data has to be converted to DataFrame. Now there is an optional kwarg to request a certain list of columns and only those will be converted, resulting in much (~1000x on large datasets) less memory usage and much faster call times.

The result is that load_results! and read_results_with_keys can be called thousands of times in a tight loop as PowerAnalytics iterates over components and it's not horrendously inefficient (just a few seconds total for large datasets). Leaving as draft in case I identify more small optimizations to add.

GabrielKS added a commit to GabrielKS/PowerAnalytics.jl that referenced this pull request Mar 6, 2024
Very roughly a thousand times faster on very large systems. Requires
NREL-Sienna/PowerSimulations.jl#1079
@GabrielKS GabrielKS marked this pull request as ready for review March 6, 2024 22:26
@GabrielKS GabrielKS self-assigned this Mar 6, 2024
Copy link
Contributor

@daniel-thom daniel-thom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. This is the first time I’ve heard about the use case of selecting individual columns.

@GabrielKS
Copy link
Contributor Author

Works for me. This is the first time I’ve heard about the use case of selecting individual columns.

Yeah, this is due to the PowerAnalytics design decision to do things at the component level as much as possible, which lets a user define metrics solely on components and have the package take care of all the aggregation by ComponentSelector automatically. I made that decision very early on, before I realized results were so tied to the concrete subtypes, but so far with optimizations like these it hasn't proven inefficient enough to abandon — ignoring compile time I can run a compute call that fetches results for thousands of components of the same concrete type in about a second.

@jd-lara jd-lara merged commit ce5f1ed into psy4 Mar 6, 2024
1 of 6 checks passed
@jd-lara jd-lara deleted the gks/cache-optimizations branch March 12, 2024 16:10
GabrielKS added a commit to GabrielKS/PowerAnalytics.jl that referenced this pull request Jun 25, 2024
Very roughly a thousand times faster on very large systems. Requires
NREL-Sienna/PowerSimulations.jl#1079
GabrielKS added a commit to GabrielKS/PowerAnalytics.jl that referenced this pull request Jul 17, 2024
Very roughly a thousand times faster on very large systems. Requires
NREL-Sienna/PowerSimulations.jl#1079
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants