value ascending / value descending #3606

chriddyp · 2019-03-05T01:39:37Z

I'm making categorical histograms (with and without z and histfunc: sum). I'd like to be able to sort these histograms in descending by value order without using transforms.

Maybe this would fit under categoryorder: 'value ascending' and categoryorder: 'value descending'? https://github.com/plotly/plotly.js/pull/419/files#diff-0e41c2e162564438ff091d0ed6b5b455R472

The text was updated successfully, but these errors were encountered:

chriddyp · 2019-03-05T16:11:57Z

I suppose similarly with stacked bar charts. in this case, it'd be nice to keep the groups (grouping the bars by color) but sort within each group. the sort could be the sum of all bars or the bars within that trace.

etpinard · 2019-03-05T16:33:46Z

Maybe this would fit under categoryorder: 'value ascending' and categoryorder: 'value descending'

It could, but maybe we want this sorting-by-value feature for non-category axes also?

alexcjohnson · 2019-03-05T19:16:29Z

What would it mean to sort a non-category (ie continuous) axis?

chriddyp · 2019-03-05T20:18:31Z

maybe we want this sorting-by-value feature for non-category axes also

I can't think of any use case for this off the top of my head

antoinerg · 2019-04-24T18:38:09Z

In the presence of multiple traces, one should be able to specify how to aggregate the values (sum, mean, max, median, etc...). This will require a new attribute. @nicolaskruchten can you remind me the name you suggested for this attribute.

nicolaskruchten · 2019-04-24T19:21:05Z

I don't think I had a good name for it :) categoryorderfunc ?

Please note that this will need to apply not only across traces but across all values of the same category (i.e. you can have multiple data rows at the same category within a trace too!)

antoinerg · 2019-04-25T01:34:01Z

In branch sort-by-value (compare to master), I use the values obtained from calc to create new sorted categorical axes that are then used to perform a second calc operation.

Unfortunately, running calc again is not optimal, but it is necessary at the moment since the calc routine is sometimes coupled to the axis object (for example in histogram, it is used for the autobinning logic). Note that this second calc call will only happen if categoryorder is "value (de|a)scending".

cc @plotly/plotly_js, @alexcjohnson

etpinard · 2019-04-25T14:22:10Z

Very nicely done @antoinerg

I don't think we can implement this w/o calling _module.calc twice, so 👌 . We should make sure though that we only call _module.calc twice on subplots with categoryorder: value (de|a)scending" axes (hint: try using gd._fullLayout.xaxis._traceIndices).

Looks like the trickiest part of this PR now becomes this block:

                for(var k = 0; k < cd.length; k++) {
                    if(type === 'scatter') {
                        categoriesValue[cd[k].x][1] += cd[k].y;
                    } else if(type === 'histogram') {
                        categoriesValue[cd[k].p][1] += cd[k].s;
                    }
                }

which will be hard to generalize. Note that categoryorder works also on polar, carpet and gl3d (which doesn't use gd.calcdata) axes. Note also that "key" in questions depends not only on the trace type but also on the axis letter. For example,

                    if(type === 'scatter') {
                        categoriesValue[cd[k].x][1] += cd[k].y;
                    }

works when the x-axis has a set categoryorder, but we should have:

                    if(type === 'scatter') {
                        categoriesValue[cd[k].y][1] += cd[k].x;
                    }

for y-axes with categoryorder.

antoinerg · 2019-04-25T18:51:00Z

Quick question: for horizontal bars, when ordering categories by ascending value, should we have the biggest value at the top or the bottom? In the figure below, the biggest value is at the top. I think I prefer the opposite but I'm not completely sure either. What do you think?

alexcjohnson · 2019-04-25T19:05:42Z

should we have the biggest value at the top or the bottom?

You could say the same about category (ascending|descending) - ascending will put a at the bottom and e at the top so they're actually in reverse alphabetical order. In fact you could even make that argument about the order based on data in the trace.

I guess with a categorical Y axis you generally do read the graph from top to bottom, as opposed to numerical axes that you almost always read bottom to top. But making that the default behavior I think has much more extensive consequences than just choosing it for the new feature. So unless we're prepared to alter the rest of that behavior, I think ascending has to put the biggest at the top.

antoinerg · 2019-04-25T21:49:09Z

Looks like the trickiest part of this PR now becomes this block:

                for(var k = 0; k < cd.length; k++) {
                    if(type === 'scatter') {
                        categoriesValue[cd[k].x][1] += cd[k].y;
                    } else if(type === 'histogram') {
                        categoriesValue[cd[k].p][1] += cd[k].s;
                    }
                }

which will be hard to generalize.

Yes, indeed. What trace types should initially be supported?

@nicolaskruchten What aggregation do we want to support initially? Would sum, min and max be sufficient?

nicolaskruchten · 2019-04-25T21:52:12Z

Sum/min/max is a fine start for this release, yes!

nicolaskruchten · 2019-04-25T21:52:34Z

Adding avg can be done in one pass also though no?

nicolaskruchten · 2019-04-25T21:53:08Z

Re trace types I would say bar and histogram are quite important

etpinard · 2019-04-25T22:04:27Z

What trace types should initially be supported?

ALL 2d cartesian please.

antoinerg · 2019-04-25T22:26:06Z

ALL 2d cartesian please.

Ok, then I will need to inspect the output of calc data for each traces.

Looks like the trickiest part of this PR now becomes this block:

                for(var k = 0; k < cd.length; k++) {
                    if(type === 'scatter') {
                        categoriesValue[cd[k].x][1] += cd[k].y;
                    } else if(type === 'histogram') {
                        categoriesValue[cd[k].p][1] += cd[k].s;
                    }
                }

which will be hard to generalize.

To support all traces, we could either have a long list of else if in src/plots/plots.js or we could also add a new field with a standardized name to each trace's calcdata.

To be continued

alexcjohnson · 2019-04-25T22:31:00Z

add a new field with a standardized name to each trace's calcdata.

sounds like the winner to me! To be computed & added only when necessary.

antoinerg · 2019-04-26T18:28:35Z

sounds like the winner to me! To be computed & added only when necessary.

Good, I also think it's a winner. histogram already has x and y and all traces on cartesian axes should probably have them. I could simply remove the distinction in commit d53e52c and it just works.

I now need to write tests to ensure all cartesian traces have x y in their calcdata and compute/add it when necessary in the ones that don't.

Thanks @alexcjohnson for pointing me in the right direction!

antoinerg · 2019-04-26T23:17:55Z

Ok, I made good progress in branch sort-by-value (compare to master), all 2d cartesian traces work except for:

scattergl
contour
contourcarpet
heatmap which work but is buggy because of issue heatmaps don't respect axis categoryarray #1097

To be continued

etpinard · 2019-04-29T13:52:55Z

Awesome work @antoinerg !

Would you be interested in trying to fix #1097 at some point this week? Sounds like fixing that bug and adding some custom logic for scattergl should be enough to ✅ this feature. I'm not sure if contourcarpet traces even work on category axes, I'm not sure if they should work either 😏

antoinerg · 2019-04-29T15:43:23Z

I'm not sure if contourcarpet traces even work on category axes, I'm not sure if they should work either

Ok good to know. They are indeed a bit different and I wasn't sure how to tackle them on Friday :) I'll focus on scattergl and fixing #1097!

nicolaskruchten · 2019-05-02T13:10:04Z

Note: if we add median as a possible sort ordering we get stuff like Ridgeplots almost for free, which is really really nice :)

nicolaskruchten · 2019-05-06T19:34:31Z

Does the current design leave room for a future way of saying "sort by the values of trace X" ?

alexcjohnson · 2019-05-06T19:37:28Z

Does the current design leave room for a future way of saying "sort by the values of trace X" ?

How "future"? Does the data/traces split count? 😉

nicolaskruchten · 2019-05-06T19:42:07Z

less future than that :) ... I guess if we added "first" and "last" and expected people to reorder their traces to match that'd be a workaround. But it would be nice to maybe sort by the nth?

antoinerg · 2019-05-06T20:03:26Z

Does the current design leave room for a future way of saying "sort by the values of trace X" ?

Thank you for the comment @nicolaskruchten. It will be easy to include in the code: when collecting values associated with each category, we could loop only over the traces we care about at this line: https://github.com/plotly/plotly.js/compare/sort-by-value#diff-ad4f76ccd6044ed16514297078e13b84R2855.

As for how it should be specified via attributes, I am not sure yet. In the end, we need to deduce the indices of the traces we want to consider. So maybe an array of trace indices? 🤔

nicolaskruchten · 2019-05-07T19:19:34Z

Just leaving a note here: I assume that this sorting will take into account matching axes, right? I.e. if I have two traces on separate subplots where one's axis matches the other, sorting will apply to both and take both into account?

nicolaskruchten · 2019-05-07T19:24:49Z

question re sorting within/across traces...

For a chart like this:

what should sort by "max" do? the height of the blues implies "p2, p1" but the maximum individual bars ignoring trace-membership implies "p1,p2"

antoinerg · 2019-05-09T22:38:34Z

Note: if we add median as a possible sort ordering we get stuff like Ridgeplots almost for free, which is really really nice :)

As discussed in private, adding support for median or say mean raises the following question in the presence of multiple traces: do you want the mean of the means or the mean of all values. We probably will need to support both.

nicolaskruchten · 2019-05-15T15:05:16Z

wanted to braindump part of a conversation I had with @antoinerg here. It's not critical to resolve this for 1.48 but I want to make sure we don't lock ourselves out of a more complete API.

It seems that in categoryorder = "value (a|de)scending" mode there are at least 3 extra pieces of information we need to specify the flavours of ordering I can imagine:

Which traces' values should be included in computing the ordering at all?
How/should we aggregate the per-trace values if there are 2 or more values per trace in a given category?
How/should we aggregate the among-trace values if there are two or more traces with values in a given category?

The motivation for 1 is that I might want to sort the axis by a single trace, and we'd likely want to specify a way of breaking ties, so in extremis we'd want to allow an exhaustive ordered list, and we'd also want to specify "no among-trace order preference".

The motivation for 2 is that in the case of grouped bars at least, I might want to sort by the height of the maximum stack, rather than the maximum single element.

The motivation for 3 is pretty clear: we need to aggregate somehow.

Taking just the case of 3 and 2 together, we might imagine an attr that takes two functions that interact such that my max-stack idea would be "max sum". In that case, the options that @antoinerg already implemented already work nicely as "max max", "min min" and "sum sum". It would also allow us to do something like "mean median" to order grouped box-plots in a particular way. Not all options make sense, clearly, such as "min max" or "mean mean" or "median median" so we could imagine a world where we enumerate the options perhaps.

So in terms of a half-baked attempt at an API, we could accept two attrs: a composite aggregation function that accepts "sum", "max sum" and friends (default=sum?), and a trace-scope array that defaults to [] meaning "no preference" but accepts an ordered list of ... trace.uids? trace indices?

nicolaskruchten · 2019-05-16T13:16:41Z

I'll leave the comment above as-is but upon further reflection, I think that point 1 is maybe interesting but doesn't allow you to sort by 'trace 1 with ties broken by trace 3'. For that we need a modification on point 3.

Elaborating: sort by max of per-trace sum of values of trace 1 is fine, but sort by max of per-trace sum of values of traces 1 and 3 isn't the same as sort by the sum of the values of trace 1, breaking ties with the sum of values of trace 3 so simply adding a trace-scoping operator won't be enough.

Maybe the 'outer' operator needs to be able to say not only sum or max or whatever but also [trace 1, trace 3] so you could say sort by [select trace 1, trace 3] of sum of values.

etpinard · 2019-05-17T21:44:58Z

@antoinerg can you open a new issue about up-coming category* improvements?

etpinard added the feature something new label Mar 5, 2019

etpinard added this to the v1.48.0 milestone Apr 11, 2019

antoinerg self-assigned this Apr 23, 2019

antoinerg mentioned this issue May 14, 2019

sort categorical Cartesian axes by value #3864

Merged

23 tasks

antoinerg closed this as completed in #3864 May 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

value ascending / value descending #3606

value ascending / value descending #3606

chriddyp commented Mar 5, 2019

chriddyp commented Mar 5, 2019

etpinard commented Mar 5, 2019

alexcjohnson commented Mar 5, 2019

chriddyp commented Mar 5, 2019

antoinerg commented Apr 24, 2019

nicolaskruchten commented Apr 24, 2019

antoinerg commented Apr 25, 2019

etpinard commented Apr 25, 2019

antoinerg commented Apr 25, 2019

alexcjohnson commented Apr 25, 2019

antoinerg commented Apr 25, 2019

nicolaskruchten commented Apr 25, 2019

nicolaskruchten commented Apr 25, 2019

nicolaskruchten commented Apr 25, 2019

etpinard commented Apr 25, 2019

antoinerg commented Apr 25, 2019 •

edited

Loading

alexcjohnson commented Apr 25, 2019

antoinerg commented Apr 26, 2019

antoinerg commented Apr 26, 2019 •

edited

Loading

etpinard commented Apr 29, 2019

antoinerg commented Apr 29, 2019

nicolaskruchten commented May 2, 2019

nicolaskruchten commented May 6, 2019

alexcjohnson commented May 6, 2019

nicolaskruchten commented May 6, 2019

antoinerg commented May 6, 2019

nicolaskruchten commented May 7, 2019

nicolaskruchten commented May 7, 2019

antoinerg commented May 9, 2019

nicolaskruchten commented May 15, 2019 •

edited

Loading

nicolaskruchten commented May 16, 2019

etpinard commented May 17, 2019

value ascending / value descending #3606

value ascending / value descending #3606

Comments

chriddyp commented Mar 5, 2019

chriddyp commented Mar 5, 2019

etpinard commented Mar 5, 2019

alexcjohnson commented Mar 5, 2019

chriddyp commented Mar 5, 2019

antoinerg commented Apr 24, 2019

nicolaskruchten commented Apr 24, 2019

antoinerg commented Apr 25, 2019

etpinard commented Apr 25, 2019

antoinerg commented Apr 25, 2019

alexcjohnson commented Apr 25, 2019

antoinerg commented Apr 25, 2019

nicolaskruchten commented Apr 25, 2019

nicolaskruchten commented Apr 25, 2019

nicolaskruchten commented Apr 25, 2019

etpinard commented Apr 25, 2019

antoinerg commented Apr 25, 2019 • edited Loading

alexcjohnson commented Apr 25, 2019

antoinerg commented Apr 26, 2019

antoinerg commented Apr 26, 2019 • edited Loading

etpinard commented Apr 29, 2019

antoinerg commented Apr 29, 2019

nicolaskruchten commented May 2, 2019

nicolaskruchten commented May 6, 2019

alexcjohnson commented May 6, 2019

nicolaskruchten commented May 6, 2019

antoinerg commented May 6, 2019

nicolaskruchten commented May 7, 2019

nicolaskruchten commented May 7, 2019

antoinerg commented May 9, 2019

nicolaskruchten commented May 15, 2019 • edited Loading

nicolaskruchten commented May 16, 2019

etpinard commented May 17, 2019

antoinerg commented Apr 25, 2019 •

edited

Loading

antoinerg commented Apr 26, 2019 •

edited

Loading

nicolaskruchten commented May 15, 2019 •

edited

Loading