Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert metric plots to plotly #359

Merged
merged 28 commits into from
Mar 23, 2020
Merged

Conversation

lboeman
Copy link
Member

@lboeman lboeman commented Mar 10, 2020

  • Closes #xxxx .
  • I am familiar with the contributing guidelines.
  • Tests added.
  • Updates entries to docs/source/api.rst for API changes.
  • Adds descriptions to appropriate "what's new" file in docs/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

Bokeh's interactivity and loading are becoming an issue for larger reports. This converts all metrics plots to plotly. The motivations for this switch are as follows:

  • Plotly has a mature and well-documented javascript library. This means that we don't have to rely entirely on the product of some python function call to alter plots. While this may be possible in Bokeh, documentation does not exist or is insufficient.
  • Plotly has a declarative json schema (https://help.plot.ly/json-chart-schema/) which makes storage of these plots convenient. The json string can be read in by either the python or javascript library to reproduce plots. This also saves us from having to store <script> elements.

The current code emulates the existing Bokeh plots closely, but there are a couple of things that need to be resolved before this is merged.

  • CSV downloads need to be re-implemented. Each plot now contains it's own data, and we've lost the ability to access the ColumnDataSource on the front end, perhaps this is just accomplished by parshing and formatting the metrics on the report.
  • Adapt the parsing of the plotly JSON such that it works for both downloads and rendering in browser, while still achieving the initial intention of lazy-loading the plots after page load.

@wholmgren
Copy link
Member

This also saves us from having to store <script> elements.

and this presents a security risk to end users that are sharing reports with each other, right?

CSV downloads need to be re-implemented. Each plot now contains it's own data, and we've lost the ability to access the ColumnDataSource on the front end, perhaps this is just accomplished by parshing and formatting the metrics on the report.

Should we make an API end point for this?

Adapt the parsing of the plotly JSON such that it works for both downloads and rendering in browser, while still achieving the initial intention of lazy-loading the plots after page load.

have you done any experiments to determine if this is actually going to make a difference in page loading?

@lboeman
Copy link
Member Author

lboeman commented Mar 10, 2020

and this presents a security risk to end users that are sharing reports with each other, right?

Yes, this makes it more foolproof to ensure we're inserting a malicious script into a template.

Should we make an API end point for this?

If it would be useful beyond this single button, then sure we could add one. Otherwise following the js route seems simpler.

have you done any experiments to determine if this is actually going to make a difference in page loading?

I haven't yet, but the intention here would be to first load the entire document, and then begin inserting metric plots into the correct divs. So we may not end up with a speedup in a completely rendered document, but shifting any long running rendering to the end of the line should improve things.

def __check_plot_spec__(plot_spec):
"""Ensure that the provided plot specification is valid JSON"""
try:
json.loads(plot_spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@lboeman lboeman Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting implementation of a 'JSON schema' parser. It doesn't seem to enforce the very opinionated specification proposed by JSON Schema .

From tinkering with this briefly, it looks like we'd still have to load the json string into a python dict and then call validate(instance=json_dict, schema={"type":"object"}) to use this effectively. It does look like it'll give us extra safety though, so that slight performance hit may be worth it?

@@ -313,7 +336,7 @@ def scatter(timeseries_value_cds, timeseries_meta_cds, units):
return fig


def construct_metrics_cds(metrics, rename=None):
def construct_metrics_dataframe(metrics, rename=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of deleting the existing code, we could create figures/plotly.py and figures/bokeh.py. Maybe not really needed but multiple plotting "backends" could be useful at some point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a really good idea. We could add logic to select the processing function based on the sfa reports version. Might require some minor tweaking to the bokeh handling, but I think that'd be a good idea.

@lboeman
Copy link
Member Author

lboeman commented Mar 12, 2020

Added the reports.figures module and within it the bokeh_figures.py and plotly_figures.py. I've adjusted the reports code to only use plotly_figures.py and leave the bokeh code there for adventurous core users. Do the changes there seem reasonable or am I off mark on that?

I've also added a script to insert the plotly plots when the standalone html file is generated. Otherwise on the dashboard, the metric_plots variable contains the json to create each plot and the appropriate divs for insertion.

@lboeman lboeman marked this pull request as ready for review March 12, 2020 20:45
Copy link
Member

@wholmgren wholmgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. I thought we'd put bokeh stuff in a bokeh module and plotly stuff in a plotly module.

reports.figures.plotly_figures.timeseries_plots


Retired functions for generating Bokeh metric plots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Retired functions for generating Bokeh metric plots.
Functions for generating Bokeh metric plots.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still using time series

Comment on lines 1 to 2
"""File containing necessary datamodel objects for creating bokeh metric
plots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""File containing necessary datamodel objects for creating bokeh metric
plots.
"""File containing datamodel objects for creating bokeh metric plots.

except (json.JSONDecodeError, ValidationError):
raise ValueError('Figure spec must be a valid json object.')


@dataclass(frozen=True)
class ReportFigure(BaseModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PlotlyReportFigure? Doesn't seem like a library-agnostic class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood earlier, are you opposed to having an empty ReportFigure parent class that gets sub classed by BokehReportFigure and PlotlyReportFigure? I think I mistook your opposition to one class with optional fields as being opposed to that as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you opposed to having an empty ReportFigure parent class that gets sub classed by BokehReportFigure and PlotlyReportFigure?

I am not opposed to that. It sounds reasonable. @alorenzo175 should comment before I waste more of your time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sub-classes are ok, but we should make sure they include a type key so that it's easier to load into BokehReportFigure or PlotlyReportFigure when needed

svg: str
figure_type: str
category: str = ''
metric: str = ''

def __post_init__(self):
__check_plot_spec__(self.spec)


@dataclass(frozen=True)
class RawReportPlots(BaseModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PlotlyRawReportPlots? (as above)

Comment on lines +5 to +6
This code is currently unreachable from the rest of the Solar Forecast Arbiter
Core library. It may be used in place of the plotly_figures to generate bokeh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the time series still used?

Comment on lines 13 to 18
from bokeh.embed import components
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource, Legend, CDSView, BooleanFilter
from bokeh.models.ranges import Range1d
from bokeh.plotting import figure
from bokeh import palettes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no bokeh stuff in the plotly module

@wholmgren
Copy link
Member

CSV downloads need to be re-implemented. Each plot now contains it's own data, and we've lost the ability to access the ColumnDataSource on the front end, perhaps this is just accomplished by parshing and formatting the metrics on the report.
Adapt the parsing of the plotly JSON such that it works for both downloads and rendering in browser, while still achieving the initial intention of lazy-loading the plots after page load.

@lboeman seems that the 2nd is now done but not the first?

@lboeman
Copy link
Member Author

lboeman commented Mar 13, 2020

@wholmgren Yes, and I found a bug in the plotting that I'm trying to rectify. You said that we don't necessarily need to display all months, or all days of the week for reports spanning only a few months or days correct?
For csv downloads I think I recall that you wanted to format the csv the same way that we are displaying the metrics tables in reports?

@wholmgren
Copy link
Member

wholmgren commented Mar 13, 2020

You said that we don't necessarily need to display all months, or all days of the week for reports spanning only a few months or days correct?

Hours plots should always have 0-24.

Day of week should always have 7 days.

I'm ok with other plots only showing periods for which there is actually data.

For csv downloads I think I recall that you wanted to format the csv the same way that we are displaying the metrics tables in reports?

Seems like a good approach but if it's difficult then we could look at other options.

@lboeman
Copy link
Member Author

lboeman commented Mar 13, 2020

So it turns out to be a pain to get plotly to display all of the tick labels you supply for a 'categorical' variable. As of this last push, hour of day displays as expected due to the numeric labels. However, the day of the week plots end up like this for, e.g. missing data on saturday
Screenshot from 2020-03-13 14-09-14
There are just missing ticks for the data that should be there.

@lboeman
Copy link
Member Author

lboeman commented Mar 16, 2020

Fixed the weekly plots so that they display a full week regardless of available data. as below:
Screenshot from 2020-03-16 11-38-24

CSV downloads have been implemented by dumping the metrics dataframe as json, and loading it into the html.

# week of data.
y_values = [plot_data[plot_data['index'] == day]['value'].iloc[0]
if not plot_data[plot_data['index'] == day].empty
else 0 for day in x_ticks]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use nan instead of 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the same problem occur with hourly if we exclude nighttime values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hourly plots work correctly when dropping nighttime values. My test report excludes them. For some reason, plotly respects the x_range of an axis when provided as a series of numerical values, but not for things it decides are categorical.

solarforecastarbiter/datamodel.py Outdated Show resolved Hide resolved
solarforecastarbiter/datamodel.py Show resolved Hide resolved
solarforecastarbiter/reports/figures/bokeh_datamodel.py Outdated Show resolved Hide resolved
solarforecastarbiter/reports/figures/plotly_figures.py Outdated Show resolved Hide resolved
solarforecastarbiter/reports/figures/plotly_figures.py Outdated Show resolved Hide resolved
solarforecastarbiter/datamodel.py Show resolved Hide resolved
solarforecastarbiter/reports/figures/plotly_figures.py Outdated Show resolved Hide resolved
@lboeman
Copy link
Member Author

lboeman commented Mar 19, 2020

I've opened an issue for adding pdf field to the datamodel.
I believe I've hit all of your collective comments and this is ready for merge. I'll have my full attention back on work on Monday if we'd like to wait for me to be more available to put out any PR related fires. The SolarArbiter/solarforecastarbiter-dashboard#208 PR should display reports adequately, and reports will need to be recomputed to work with this PR.

Copy link
Contributor

@alorenzo175 alorenzo175 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking pretty good. some minor things

spec: str
svg: str
figure_type: str
figure_class: str = 'plotly'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this last and possibly make it a field(init=False) so users can't set it

{% if plotly_version %}
{# plotly js and the python plotly library do not have matching versions #}
{% include "insert_plots.html" %}
<script src="https://cdn.plot.ly/plotly-1.52.3.min.js"></script>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason to record plotly_version in the report then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it might be useful to debug any issues that arise from an older report and newer core library. But maybe just recomputing is the answer there.

svg = fig.to_image(format='svg').decode('utf-8')
except Exception:
logger.error('Could not generate SVG for figure %s',
getattr(fig, 'name', 'unnamed'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess plotly figures don't have name? can you replace name with something that identifies the figure?

for mvalue in metric_result.values:
new = {
'name': metric_result.name,
'abbrev': f(metric_result.name),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea why the abbreviate column is the same as name when I run the test report from the cli?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had neglected to pass the abbreviate function when generating the metrics_json template var in template.py.

…gure title, pass abbreviation function when generating metrics_json object for csv download
@lboeman
Copy link
Member Author

lboeman commented Mar 23, 2020

@alorenzo175 Think I've hit all your comments. I'm not really opposed to removing plotly_version from the report figures, but I think it might be helpful in debugging down the road if we want to do our own version matching between plotly's js library and python.

@@ -1255,10 +1257,9 @@ def from_dict(model, input_dict, raise_on_extra=False):
dict_ = input_dict.copy()
if model != ReportFigure:
return super().from_dict(dict_, raise_on_extra)
figure_class = dict_.get('figure_class')
if figure_class == 'plotly':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eee, maybe making init false was a bad idea. I prefer using the figure class to determine this, so probably best to remove that, but keep the parameter at the end

Copy link
Contributor

@alorenzo175 alorenzo175 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to wait for the dashboard?

@lboeman
Copy link
Member Author

lboeman commented Mar 23, 2020

do we need to wait for the dashboard?

Dashboard PR SolarArbiter/solarforecastarbiter-dashboard#208 should work with this PR. Its tests should pass after this is merged. Any existing reports will need to be recomputed. I can take on the API issue to add a recompute endpoint so it'll be easier to roll this out into prod later if you're not already working on it.

@alorenzo175
Copy link
Contributor

if you're not already working on it.

I'm not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants