Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent indexes for tick label plotting #28733

Merged
merged 19 commits into from
Nov 21, 2020

Conversation

nrebena
Copy link
Contributor

@nrebena nrebena commented Oct 1, 2019

The tick position for BarPlot can be define using the convert tool from matplotlib.

The main advantage is that it will reuse the same position when you have text as axis values. Previously, the tick position was determined by the order of the given element, so that ['A', 'B'] where given label [0, 1], and if updating the plot with order ['B', 'A'], you will not draw at the right position.

The resulting plot for each issue would be:
issue26186
issue11465

@nrebena
Copy link
Contributor Author

nrebena commented Oct 1, 2019

So one of the example of the doc revealed a bug in this PR.
I added it to the test suite and hopefully will fix it. It is about using a MultiIndex of string (or containing a string in a level?) as axis label.

nrebena added a commit to nrebena/pandas that referenced this pull request Oct 3, 2019
@nrebena
Copy link
Contributor Author

nrebena commented Oct 3, 2019

This PR solved the bug related to index, but handling MultiIndex in the same way requires more work and I consider it outside the scope of this PR, so I expressely did not handle it.

The best way to handle MultiIndex plot would be via label grouping and multi-level axis label, as described in this issue on matplotlib matplotlib/matplotlib#6321, and this one on pandas #2088. I may try to tackle this one day, but definitly not in this PR.

pandas/plotting/_matplotlib/core.py Show resolved Hide resolved
@@ -1129,14 +1129,18 @@ def test_bar_categorical(self):
for df in [df1, df2]:
ax = df.plot.bar()
ticks = ax.xaxis.get_ticklocs()
tm.assert_numpy_array_equal(ticks, np.array([0, 1, 2, 3, 4, 5]))
tm.assert_numpy_array_equal(
ticks, np.array([0, 1, 2, 3, 4, 5], dtype=np.float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this need to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ticks pos are handled by matplotlib fonction convert_xunits, which return a float array rather than the int array previously used.
I would argue that having float array for tick position make more sense to have the same type for all plot.
I do not think that the point of the test was to ensure the type of the tick position, so I changed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand what part of this PR would have changed this? Still not sure on this

Copy link
Contributor Author

@nrebena nrebena Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the PR, the tick position where defined by using self.tick_pos = np.arange(len(data)).
In this PR, I let matplotlib handle the position by calling

ax.xaxis.update_units(self.ax_index)
self.tick_pos = ax.convert_xunits(self.ax_index)

This use the matplotlib Conversion Interface. For convertiong of string category, this use numpy float array (see https://matplotlib.org/_modules/matplotlib/category.html#StrCategoryConverter), so the tick position are now float instead of int.

I could cast this back to an int array, but it made more sense IMO to change the test here, that trying to force the tick position to be integer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should stay as integer to not break backwards compatibility - how hard would it be to accomplish that?

pandas/tests/plotting/test_frame.py Outdated Show resolved Hide resolved
pandas/tests/plotting/test_frame.py Outdated Show resolved Hide resolved
nrebena added a commit to nrebena/pandas that referenced this pull request Nov 19, 2019
@jreback jreback added the Visualization plotting label Nov 20, 2019
@jreback jreback changed the title Issue 26186 tick label Inconsistent indexes for tick label plotting Nov 20, 2019
@nrebena nrebena requested a review from WillAyd December 16, 2019 20:09
@jreback
Copy link
Contributor

jreback commented Dec 27, 2019

can you merge master and will look again

@@ -1129,14 +1129,18 @@ def test_bar_categorical(self):
for df in [df1, df2]:
ax = df.plot.bar()
ticks = ax.xaxis.get_ticklocs()
tm.assert_numpy_array_equal(ticks, np.array([0, 1, 2, 3, 4, 5]))
tm.assert_numpy_array_equal(
ticks, np.array([0, 1, 2, 3, 4, 5], dtype=np.float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should stay as integer to not break backwards compatibility - how hard would it be to accomplish that?

@@ -1090,6 +1090,7 @@ Plotting
- Bug in color validation incorrectly raising for non-color styles (:issue:`29122`).
- Allow :meth:`DataFrame.plot.scatter` to plot ``objects`` and ``datetime`` type data (:issue:`18755`, :issue:`30391`)
- Bug in :meth:`DataFrame.hist`, ``xrot=0`` does not work with ``by`` and subplots (:issue:`30288`).
- Bug in BarPlot. Tick position where assigned by value order instead of using the value for numeric, or a smart order for sting. (:issue:`26186` and issue:`11465`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to 1.1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

nrebena added a commit to nrebena/pandas that referenced this pull request Feb 12, 2020
@nrebena
Copy link
Contributor Author

nrebena commented Feb 15, 2020

I made the dtype for tick array as int as pointed by @WillAyd, see last commit change.
I also rebased on master.

nrebena added a commit to nrebena/pandas that referenced this pull request Feb 16, 2020
nrebena added a commit to nrebena/pandas that referenced this pull request Feb 16, 2020
@WillAyd
Copy link
Member

WillAyd commented Mar 14, 2020

@nrebena I think this should be good. Can you fix the merge conflict?

nrebena added a commit to nrebena/pandas that referenced this pull request Mar 14, 2020
@nrebena
Copy link
Contributor Author

nrebena commented Mar 14, 2020

Rebased and green 🍏 @WillAyd

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - @jreback care to look?

@@ -345,6 +345,7 @@ Plotting
- :func:`.plot` for line/bar now accepts color by dictonary (:issue:`8193`).
-
- Bug in :meth:`DataFrame.boxplot` and :meth:`DataFrame.plot.boxplot` lost color attributes of ``medianprops``, ``whiskerprops``, ``capprops`` and ``medianprops`` (:issue:`30346`)
- Bug in BarPlot. Tick position where assigned by value order instead of using the value for numeric, or a smart order for sting. (:issue:`26186` and issue:`11465`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u can add the proper reference here to plot.bar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the notes are duplicated (2nd one looks good)

pandas/plotting/_matplotlib/core.py Show resolved Hide resolved
@@ -1345,6 +1348,16 @@ def _make_plot(self):

for i, (label, y) in enumerate(self._iter_data(fillna=0)):
ax = self._get_ax(i)

if self.orientation == "vertical":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any assert in orientation that it takes on only certain values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No assert. Orientation is never a parameter. It is defined by the BarPlot class as "vertical", and overloaded by the Barhplot class as "horizontal". It is also defined in LinePlot class.
Globaly, it is use by the _post_plot_logic_common method from the base class MPLPlot.

Copy link
Member

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @nrebena

nice change and I think it's very close to get merged,

just two nits since there is conflicts if you are still interested in.

elif self.orientation == "horizontal":
ax.yaxis.update_units(self.ax_index)
self.tick_pos = ax.convert_yunits(self.ax_index).astype(np.int)
self.ax_pos = self.tick_pos - self.tickoffset
Copy link
Member

@charlesdong1991 charlesdong1991 May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this self.ax_pos = self.tick_pos - self.tickoffset can be taken out from if.. else.. clause, and no need to use it twice. the main purpose of this clause is to get self.tick_pos for bar and barh separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

errors = gp3.std()

# No assertion we just ensure that we can plot a MultiIndex bar plot
means.plot.bar(yerr=errors, capsize=4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have user warning here? shall catch it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.

@MarcoGorelli
Copy link
Member

Hi @nrebena - sorry to chase you up, just wanted to ask if this is still active

@nrebena
Copy link
Contributor Author

nrebena commented Jun 3, 2020

Hi. No worries. I may come around to finish it, but if someone want to take it over no problem.

nrebena added a commit to nrebena/pandas that referenced this pull request Jun 13, 2020
Copy link
Member

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @nrebena for sticking to it, sorry for such long time! the CI error doesn't seem related to this PR

cc @jreback for comments

@arw2019
Copy link
Member

arw2019 commented Nov 21, 2020

@nrebena can you merge master once more?

ping @jreback

@charlesdong1991
Copy link
Member

thanks @nrebena , very nice PR!

thanks @MarcoGorelli for resolving the conflicts!

@jreback
Copy link
Contributor

jreback commented Jan 18, 2021

this PR was reverted, but I guess we don't have a tracking issue. @nrebena can you open one.

simonjayhawkins added a commit that referenced this pull request Jan 18, 2021
…ing (#28733)" (#39252)

Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
nofarm3 pushed a commit to nofarm3/pandas that referenced this pull request Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsitent index for plot X axis weirdness with DataFrame.plot(kind='bar')
7 participants