Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display every record as a line in a scatter plot #257

Open
mschoema opened this issue Mar 15, 2021 · 6 comments
Open

Display every record as a line in a scatter plot #257

mschoema opened this issue Mar 15, 2021 · 6 comments

Comments

@mschoema
Copy link

Hi,

Short Feature Explanation
I am wondering if it would be possible to create a scatter plot that displays a line per record instead of a point per record.
This would be done by selecting two columns storing arrays of values for the x and y fields.

For example, with a table: Temp(xs int[], ys[]), selecting the xs and ys columns for the x and y fields respectively would create a scatterplot with as many lines as records in table A, and as many points per line as values in the xs and ys arrays. (Of course, xs and ys should have the same amount of elements each)

Context
To explain my problem, I am a developer of MobilityDB, and I am trying to display temporal properties in QGIS using DataPlotly. An example of a table that we want to display would be:

Ports(name text, port geometry(Polygon), shipsInside tint)

Every record thus represents a port and has an attribute storing the number of ships inside this port over time. The ports are represented as polygons on the map, and I would thus like to represent the 'shipsInside' attribute as a line on a scatter plot.

For simplicity, let's assume that this temporal attribute is stored in two columns: one containing an array of timestamps, and one containing an array of values: (this can be done in practice as well)

Ports(name text, port geometry, ts timestamptz[], vals int[])

Current Workaround
Currently, I can display a single record of the original table by creating a new table for it:

Ports_temp(name text, port geometry, t timestamptz, val int)

This table contains a record for each pair of (t, val) in the arrays of the original record.
Using this table, I can then create a scatterplot using t and val as the x and y fields respectively.

Of course, this solution is not ideal, since this demands a new table for each record of the original Ports table.

Conclusion
I am thus wondering how hard it would be to allow scatterplots to display such records with temporal attributes as lines on a scatterplot.
Ideally, this should be done either by selecting two columns that store arrays of values or selecting a single temporal column (tint, float or tbool).

Best Regards,
Maxime Schoemans

@ghtmtt
Copy link
Owner

ghtmtt commented Mar 16, 2021

mmm I get what you mean but that is no trivial. DataPlotly relies on plotly as the plot engine. Plotly API are basically driving the plot creation, while DataPlotly is a kind of QGIS-plotly front end that adds the interaction plot and map. From a soft API study it seems that plotly accepts single like values for both x and y (like lists, series, etc) and cannot accept array like values for a single axis.

I'm ccing also @nyalldawson that has implemented the temporal control of QGIS.

@mschoema
Copy link
Author

Thank you for this fast answer.

From having a look around the Plotly API, it indeed seems that scatter plots only allow for 1D arrays as input values, and that line plots can only be created by drawing multiple scatter plots on top of each other (one for each line).

I also see in the DataPlotly code that, when creating the trace of a scatter plot, an array of a single graph_objs.Scatterplot type is returned. Could it be possible to create a new Line plot type that requires this array of arrays as values and returns an array of graph_objs.Scatterplot objects when building the trace? I believe that graph_objs.Figure allows an array of graph_objs as input, so this might be a possible solution.

What do you think about this?

@ghtmtt
Copy link
Owner

ghtmtt commented Mar 19, 2021

I think that's not a trivial implementation. We have to add a new API that basically will accept an array of graph_objs.Scatterplot and that build a plot of plots.

The graph_objs.Scatterplot is returned an an array because plotly has 3 main objects: plot (AKA data), layout and figure. The Figure is upper level object that contains all the data and layouts.

have you tried to build a simple plotly figure (within QGIS without DataPlotly) and see what is happening when you build figures for each feature arrays? Something like:

import plotly
import plotly.graph_objs as go

vl = iface.activeLayer()

traces = []
for i in vl.getFeatures():
    trace = go.Scatter(
        x = i["first_array"],
        y = i["second_array"]
    )
    ......
    traces.append(trace)

layout = go.Layout(
    showlegend = True,
)

data = [traces]
fig = go.Figure(data=data, layout = layout)
fig.show()

That would be interesting ;)

@mschoema
Copy link
Author

I ran the code on AIS data of ships and ports in Denmark, where the value that we want to display is a temporal integer corresponding to the count of ships in the different ports over time. To run the code I simply modified the name of the data columns, removed the line data = [traces], and gave the list of traces directly to the Figure: fig = go.Figure(data=traces, layout=layout). This gave me the following result:

image

This is exactly the functionality that we need. It indeed seems that this would require a new plot type (e.g.: 'Line plot'), that would return a list of scatterplots in its 'create_trace()' method. What do you think of this solution?

@ghtmtt
Copy link
Owner

ghtmtt commented Mar 23, 2021

This is exactly the functionality that we need. It indeed seems that this would require a new plot type (e.g.: 'Line plot'), that would return a list of scatterplots in its 'create_trace()' method. What do you think of this solution?

great that the code worked ;)

actually we don't need a new plot type, we need a way to create a plot as a result of a loop of other plots

@mschoema
Copy link
Author

Is this a feature that you are interested in adding to DataPlotly, and if so, how should we proceed with the development?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants