Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

value ascending / value descending #3606

Closed
chriddyp opened this issue Mar 5, 2019 · 32 comments · Fixed by #3864
Closed

value ascending / value descending #3606

chriddyp opened this issue Mar 5, 2019 · 32 comments · Fixed by #3864
Assignees
Labels
feature something new
Milestone

Comments

@chriddyp
Copy link
Member

chriddyp commented Mar 5, 2019

I'm making categorical histograms (with and without z and histfunc: sum). I'd like to be able to sort these histograms in descending by value order without using transforms.

Maybe this would fit under categoryorder: 'value ascending' and categoryorder: 'value descending'? https://github.com/plotly/plotly.js/pull/419/files#diff-0e41c2e162564438ff091d0ed6b5b455R472

@chriddyp
Copy link
Member Author

chriddyp commented Mar 5, 2019

I suppose similarly with stacked bar charts. in this case, it'd be nice to keep the groups (grouping the bars by color) but sort within each group. the sort could be the sum of all bars or the bars within that trace.
image

@etpinard
Copy link
Contributor

etpinard commented Mar 5, 2019

Maybe this would fit under categoryorder: 'value ascending' and categoryorder: 'value descending'

It could, but maybe we want this sorting-by-value feature for non-category axes also?

@etpinard etpinard added the feature something new label Mar 5, 2019
@alexcjohnson
Copy link
Collaborator

What would it mean to sort a non-category (ie continuous) axis?

@chriddyp
Copy link
Member Author

chriddyp commented Mar 5, 2019

maybe we want this sorting-by-value feature for non-category axes also

I can't think of any use case for this off the top of my head

@etpinard etpinard added this to the v1.48.0 milestone Apr 11, 2019
@antoinerg antoinerg self-assigned this Apr 23, 2019
@antoinerg
Copy link
Contributor

In the presence of multiple traces, one should be able to specify how to aggregate the values (sum, mean, max, median, etc...). This will require a new attribute. @nicolaskruchten can you remind me the name you suggested for this attribute.

@nicolaskruchten
Copy link
Contributor

I don't think I had a good name for it :) categoryorderfunc ?

Please note that this will need to apply not only across traces but across all values of the same category (i.e. you can have multiple data rows at the same category within a trace too!)

@antoinerg
Copy link
Contributor

In branch sort-by-value (compare to master), I use the values obtained from calc to create new sorted categorical axes that are then used to perform a second calc operation.

Unfortunately, running calc again is not optimal, but it is necessary at the moment since the calc routine is sometimes coupled to the axis object (for example in histogram, it is used for the autobinning logic). Note that this second calc call will only happen if categoryorder is "value (de|a)scending".

cc @plotly/plotly_js, @alexcjohnson

@etpinard
Copy link
Contributor

Very nicely done @antoinerg

I don't think we can implement this w/o calling _module.calc twice, so 👌 . We should make sure though that we only call _module.calc twice on subplots with categoryorder: value (de|a)scending" axes (hint: try using gd._fullLayout.xaxis._traceIndices).

Looks like the trickiest part of this PR now becomes this block:

                for(var k = 0; k < cd.length; k++) {
                    if(type === 'scatter') {
                        categoriesValue[cd[k].x][1] += cd[k].y;
                    } else if(type === 'histogram') {
                        categoriesValue[cd[k].p][1] += cd[k].s;
                    }
                }

which will be hard to generalize. Note that categoryorder works also on polar, carpet and gl3d (which doesn't use gd.calcdata) axes. Note also that "key" in questions depends not only on the trace type but also on the axis letter. For example,

                    if(type === 'scatter') {
                        categoriesValue[cd[k].x][1] += cd[k].y;
                    }

works when the x-axis has a set categoryorder, but we should have:

                    if(type === 'scatter') {
                        categoriesValue[cd[k].y][1] += cd[k].x;
                    }

for y-axes with categoryorder.

@antoinerg
Copy link
Contributor

Quick question: for horizontal bars, when ordering categories by ascending value, should we have the biggest value at the top or the bottom? In the figure below, the biggest value is at the top. I think I prefer the opposite but I'm not completely sure either. What do you think?
newplot (70)

@alexcjohnson
Copy link
Collaborator

should we have the biggest value at the top or the bottom?

You could say the same about category (ascending|descending) - ascending will put a at the bottom and e at the top so they're actually in reverse alphabetical order. In fact you could even make that argument about the order based on data in the trace.

I guess with a categorical Y axis you generally do read the graph from top to bottom, as opposed to numerical axes that you almost always read bottom to top. But making that the default behavior I think has much more extensive consequences than just choosing it for the new feature. So unless we're prepared to alter the rest of that behavior, I think ascending has to put the biggest at the top.

@antoinerg
Copy link
Contributor

Looks like the trickiest part of this PR now becomes this block:

                for(var k = 0; k < cd.length; k++) {
                    if(type === 'scatter') {
                        categoriesValue[cd[k].x][1] += cd[k].y;
                    } else if(type === 'histogram') {
                        categoriesValue[cd[k].p][1] += cd[k].s;
                    }
                }

which will be hard to generalize.

Yes, indeed. What trace types should initially be supported?

@nicolaskruchten What aggregation do we want to support initially? Would sum, min and max be sufficient?

@nicolaskruchten
Copy link
Contributor

Sum/min/max is a fine start for this release, yes!

@nicolaskruchten
Copy link
Contributor

Adding avg can be done in one pass also though no?

@nicolaskruchten
Copy link
Contributor

Re trace types I would say bar and histogram are quite important

@etpinard
Copy link
Contributor

What trace types should initially be supported?

ALL 2d cartesian please.

@antoinerg
Copy link
Contributor

antoinerg commented Apr 25, 2019

ALL 2d cartesian please.

Ok, then I will need to inspect the output of calc data for each traces.

Looks like the trickiest part of this PR now becomes this block:

                for(var k = 0; k < cd.length; k++) {
                    if(type === 'scatter') {
                        categoriesValue[cd[k].x][1] += cd[k].y;
                    } else if(type === 'histogram') {
                        categoriesValue[cd[k].p][1] += cd[k].s;
                    }
                }

which will be hard to generalize.

To support all traces, we could either have a long list of else if in src/plots/plots.js or we could also add a new field with a standardized name to each trace's calcdata.

To be continued

@alexcjohnson
Copy link
Collaborator

add a new field with a standardized name to each trace's calcdata.

sounds like the winner to me! To be computed & added only when necessary.

@antoinerg
Copy link
Contributor

sounds like the winner to me! To be computed & added only when necessary.

Good, I also think it's a winner. histogram already has x and y and all traces on cartesian axes should probably have them. I could simply remove the distinction in commit d53e52c and it just works.

I now need to write tests to ensure all cartesian traces have x y in their calcdata and compute/add it when necessary in the ones that don't.

Thanks @alexcjohnson for pointing me in the right direction!

@antoinerg
Copy link
Contributor

antoinerg commented Apr 26, 2019

Ok, I made good progress in branch sort-by-value (compare to master), all 2d cartesian traces work except for:

To be continued

@etpinard
Copy link
Contributor

Awesome work @antoinerg !

Would you be interested in trying to fix #1097 at some point this week? Sounds like fixing that bug and adding some custom logic for scattergl should be enough to ✅ this feature. I'm not sure if contourcarpet traces even work on category axes, I'm not sure if they should work either 😏

@antoinerg
Copy link
Contributor

I'm not sure if contourcarpet traces even work on category axes, I'm not sure if they should work either

Ok good to know. They are indeed a bit different and I wasn't sure how to tackle them on Friday :) I'll focus on scattergl and fixing #1097!

@nicolaskruchten
Copy link
Contributor

Note: if we add median as a possible sort ordering we get stuff like Ridgeplots almost for free, which is really really nice :)

@nicolaskruchten
Copy link
Contributor

Does the current design leave room for a future way of saying "sort by the values of trace X" ?

@alexcjohnson
Copy link
Collaborator

Does the current design leave room for a future way of saying "sort by the values of trace X" ?

How "future"? Does the data/traces split count? 😉

@nicolaskruchten
Copy link
Contributor

less future than that :) ... I guess if we added "first" and "last" and expected people to reorder their traces to match that'd be a workaround. But it would be nice to maybe sort by the nth?

@antoinerg
Copy link
Contributor

Does the current design leave room for a future way of saying "sort by the values of trace X" ?

Thank you for the comment @nicolaskruchten. It will be easy to include in the code: when collecting values associated with each category, we could loop only over the traces we care about at this line: https://github.com/plotly/plotly.js/compare/sort-by-value#diff-ad4f76ccd6044ed16514297078e13b84R2855.

As for how it should be specified via attributes, I am not sure yet. In the end, we need to deduce the indices of the traces we want to consider. So maybe an array of trace indices? 🤔

@nicolaskruchten
Copy link
Contributor

Just leaving a note here: I assume that this sorting will take into account matching axes, right? I.e. if I have two traces on separate subplots where one's axis matches the other, sorting will apply to both and take both into account?

@nicolaskruchten
Copy link
Contributor

question re sorting within/across traces...

For a chart like this:
image

what should sort by "max" do? the height of the blues implies "p2, p1" but the maximum individual bars ignoring trace-membership implies "p1,p2"

@antoinerg
Copy link
Contributor

Note: if we add median as a possible sort ordering we get stuff like Ridgeplots almost for free, which is really really nice :)

As discussed in private, adding support for median or say mean raises the following question in the presence of multiple traces: do you want the mean of the means or the mean of all values. We probably will need to support both.

@nicolaskruchten
Copy link
Contributor

nicolaskruchten commented May 15, 2019

wanted to braindump part of a conversation I had with @antoinerg here. It's not critical to resolve this for 1.48 but I want to make sure we don't lock ourselves out of a more complete API.

It seems that in categoryorder = "value (a|de)scending" mode there are at least 3 extra pieces of information we need to specify the flavours of ordering I can imagine:

  1. Which traces' values should be included in computing the ordering at all?
  2. How/should we aggregate the per-trace values if there are 2 or more values per trace in a given category?
  3. How/should we aggregate the among-trace values if there are two or more traces with values in a given category?

The motivation for 1 is that I might want to sort the axis by a single trace, and we'd likely want to specify a way of breaking ties, so in extremis we'd want to allow an exhaustive ordered list, and we'd also want to specify "no among-trace order preference".

The motivation for 2 is that in the case of grouped bars at least, I might want to sort by the height of the maximum stack, rather than the maximum single element.

The motivation for 3 is pretty clear: we need to aggregate somehow.

Taking just the case of 3 and 2 together, we might imagine an attr that takes two functions that interact such that my max-stack idea would be "max sum". In that case, the options that @antoinerg already implemented already work nicely as "max max", "min min" and "sum sum". It would also allow us to do something like "mean median" to order grouped box-plots in a particular way. Not all options make sense, clearly, such as "min max" or "mean mean" or "median median" so we could imagine a world where we enumerate the options perhaps.

So in terms of a half-baked attempt at an API, we could accept two attrs: a composite aggregation function that accepts "sum", "max sum" and friends (default=sum?), and a trace-scope array that defaults to [] meaning "no preference" but accepts an ordered list of ... trace.uids? trace indices?

@nicolaskruchten
Copy link
Contributor

I'll leave the comment above as-is but upon further reflection, I think that point 1 is maybe interesting but doesn't allow you to sort by 'trace 1 with ties broken by trace 3'. For that we need a modification on point 3.

Elaborating: sort by max of per-trace sum of values of trace 1 is fine, but sort by max of per-trace sum of values of traces 1 and 3 isn't the same as sort by the sum of the values of trace 1, breaking ties with the sum of values of trace 3 so simply adding a trace-scoping operator won't be enough.

Maybe the 'outer' operator needs to be able to say not only sum or max or whatever but also [trace 1, trace 3] so you could say sort by [select trace 1, trace 3] of sum of values.

@etpinard
Copy link
Contributor

@antoinerg can you open a new issue about up-coming category* improvements?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants