-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add different display types for leaf_distributions x-axis. #254
Conversation
@parrt @tlapusan LMK what you think! I like this as one potential solution, the gaps between the node IDs aren't too distracting if it there are a lot of leaves. We can add other options to Also, when we are happy with the path forward, the changes will also need to be made to |
Probably should standardize the x-axis title to |
Seems to me that node IDs just aren't useful unless you can identify a specific node, so for the auto I just don't think I would ever find it useful. I think we should order the information by its usefulness not the arbitrary node ids. Too bad we can't simply have a hover over the bar that would pop up the node id if you want it. I don't think we can do interactive stuff with matplotlib. |
I know I would find For example, I would use it to make a first figure of all the nodes to say "here are all of the nodes of the tree, as you can see nodes X, Y, Z are interesting because of ...", and then include a second figure with |
I suppose the node ID General area could be useful and it doesn't hurt to include but fewer options is always good. What do you think about ordering the graph by magnitude of y axis to get a better idea of what's important? |
Yeah, we could do interactive stuff with something like I'll add a |
@parrt how does this look for |
@mepland curios what do you mean by "cropped down to those nodes of interes" :) I see useful the utility of "auto" in this case :
|
@tlapusan I meant basically what you suggested. We could include any combo of:
I think 1 and 2 are the most useful, with 2 anything, such as 3, can be added on by the user. |
Hi @mepland, I pull your code and ran the ctree_leaf_distributions for each xaxis_display_type value. It seems to be some issues (worked in sklearn notebook). For auto, the x values are not correct distributed and for "y_sorted" there is a code issue (maybe because meanwhile we changed the code to support multi class labels (not only binary classification)). |
@mepland nice, those bugs were fixed :) few suggestions :
|
Looks pretty cool to me! |
Would it be possible to get the bars the same width (maybe only when there is room)? I guess it's no big deal and they aren't uniform as they are now. I'm still not convinced we'll get a good way to identify leaves of interest looking at x axis labels. What about tooltip text? Unlike the tree view, these are just matplotlib plots so we can do tooltips: |
tooltip could also help :) I tried the first example from https://mplcursors.readthedocs.io/en/stable/ and it's not working for me... and I do meet the requirements "Python 3, and Matplotlib≥3.1." I see tooltip as a 'hidden' feature... not sure that majority of people will discover it. |
haha. figures mplcursors didn't work. I guess I am OK with this PR; if it's useful to you guys that's good enough for me! |
@mepland let me know when you are happy with it and I will merge. |
Hi @parrt and @tlapusan, I just added a
|
cool! Typo here though: "to filter to leaves with 100". Should that be "filtering out"? |
Replying to your comments @tlapusan:
I agree it doesn't look the best, but if we force the x-axis ticks to have an int format then the tick labels do not make sense. In this example it would be 2, 5, 7, 10, 12, 15. I think it is better to leave as is, especially as users will be using
When sorted by y value the leaf |
@parrt, not really a typo, the filter keeps leaves that pass its criteria. I'd be open to rewording it though, if it's not clear. Any suggestions? |
@parrt, aren't the bars all the same width now? They just expand and contract to fit the available horizontal space. I guess I'm not sure what you mean by this. And yeah, tooltip text is nice in theory - but frequently breaks and can't be printed to a flat image anyway 😄 |
no worries. i just meant two diff nodes have diff width bars. |
yeah, that doesn't make sense to me. Maybe: "to filter for"? For example, to filter for leaves with 100 < total samples, and 5 < class 1 samples, use show_leaf_filter = lambda x: (100 < np.sum(x)) & (5 < x[1]) Or, For example, to view only those leaves with more than 100 total samples and less than 5 class 1 samples, use show_leaf_filter = lambda x: (100 < np.sum(x)) & (5 < x[1]) Why do you use "100 < total" btw? Ain't that backwards? Never seen that except C programmers in 1980s. |
Updated to this.
Well, I did learn to program in C++ from particle physicists mostly trained in the 80s and early 90s so that might be part of it 😃, but I also like when the inequalities increase from left to right, like a number line. |
@parrt I think it's ready to merge! |
Ah with that interpretation of an axis, I can understand why that makes sense. Unfortunately no one expects that ;) |
yay! Thanks very much for these important changes... |
No description provided.