Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid circular import between core.series and plotting._xyz #16931

Closed
wants to merge 3 commits into from

Conversation

jbrockmendel
Copy link
Member

This follows up on #16913 but cuts down on the scope of the PR.

pandas.core.series currently runs

import pandas.plotting._core as _gfx # noqa

about 8 lines from the bottom of the file. pandas.plotting._core runs

from pandas.core.series import Series, remove_na

at the top of the file. Using very small words and talking very slowly, it was explained to me that this circular import works because Series and remove_na are defined in the series namespace by the time plotting._core tries to import them.

This PR gets rid of the module-level import from core.series and moves the import of plotting._core to the top of core.series so as to avoid relying on hard-to-predict/debug behavior.

Other changes that I'd like to make (like deleting the extraneous ret = Series() at plotting/_core.py:L2325) are ignored for now.

  • passes git diff upstream/master --name-only -- '*.py' | flake8 --diff (On Windows, git diff upstream/master -u -- "*.py" | flake8 --diff might work as an alternative.)

This follows up on pandas-dev#16913 but cuts down on the scope of the PR.
@gfyoung gfyoung added the Refactor Internal refactoring of code label Jul 14, 2017
This equality failing is my best guess for why tests are failing
@@ -334,7 +334,11 @@ def result(self):
def _compute_plot_data(self):
data = self.data

if isinstance(data, Series):
from pandas import Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit for this PR led to a bunch of TypeErrors on Travis that I can't reproduce locally. My best/only guess as to why is that the check isinstance(data, ABSeries) (now at L341) is somehow evaluating differently than the original isinstance(data, Series) (previously at L337).

So 337-339 is an attempt to check this guess. Regardless of whether the guess is correct, this check will be removed before long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New hypothesis: Series._set_subtyp can sometimes set _subtyp to time_series, in which case it will not get recognized as ABCSeries. It tentatively looks like the Travis errors are date-related cases.

@@ -1494,6 +1499,7 @@ def _args_adjust(self):

@classmethod
def _plot(cls, ax, y, column_num=None, return_type='axes', **kwds):
from pandas.core.series import remove_na
Copy link
Contributor

@jreback jreback Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to move remove_na to pandas.core.dtypes.missing, and rename to remove_na_arraylike (will have to update a couple of references)

(pandas) bash-3.2$ grep -R remove_na pandas
pandas/core/series.py:            result = remove_na(self)
pandas/core/series.py:def remove_na(series):
pandas/plotting/_core.py:from pandas.core.series import Series, remove_na
pandas/plotting/_core.py:        y = remove_na(y)
pandas/plotting/_core.py:            y = [remove_na(v) for v in y]
pandas/plotting/_core.py:            y = remove_na(y)
pandas/plotting/_core.py:        values = [remove_na(v) for v in values]
pandas/tests/test_panel.py:from pandas.core.series import remove_na
pandas/tests/test_panel.py:                nona = remove_na(x)
pandas/tests/test_panel4d.py:from pandas.core.series import remove_na
pandas/tests/test_panel4d.py:                nona = remove_na(x)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would allow the import to be at the top

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About to make a PR that makes this move and nothing else. We'll see if Travis still complains about that.

if isinstance(data, Series):
from pandas import Series
if isinstance(data, ABCSeries) != isinstance(data, Series):
raise Exception('WTFError', data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make that error available only to developers when they rage-quit 😄

@jbrockmendel
Copy link
Member Author

I can now reproduce at least some of these errors locally by adding matplotlib to the virtualenv. (I had assumed that that `pip install -r ci/requirements_dev.txt' would get everything needed). But I still have no earthly idea what is causing them. The WTFError is not getting hit, so someone I'm declaring it someone else's turn.

Incidentally, pip install -r ci/requirements_all.txt fails because beautiful-soup is not recognized. Maybe that should be bs4?

@gfyoung
Copy link
Member

gfyoung commented Jul 15, 2017

@jbrockmendel : The requirements_all.txt is for conda, not pip. That's why you are having the issue.

@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

let's close this one in light of your other PR's.

@jreback jreback closed this Jul 19, 2017
@gfyoung gfyoung added this to the No action milestone Jul 19, 2017
@jbrockmendel jbrockmendel deleted the nocircle branch July 21, 2017 18:44
@jbrockmendel
Copy link
Member Author

The DataFrame part of this has been merged, but the Series version is still FUBAR. I'll describe some things I've tried in the hopes that someone smarter than me will find this interesting/annoying enough to take a look at.

Everything here assumes we have delayed the import of Series in pandas.plotting._core and pandas.plotting._tools. If I move the import pandas.plotting._core as _gfx from the bottom of core.series up to the top of the file (without moving the place where Series.plot and Series.hist are defined), 9 tests fail in tests.plotting.test_series. AFAICT these all plotting series with PeriodIndexs.

The tracebacks go through matplotlib.__init__.inner, so I patched that to print its inputs and and raise regardless. I see no obvious difference in inputs between the version that fails and the version that passes.

If I move the import of _gfx to right after the definition of Series, above the call to Series._setup_axes, the tests go through fine. But I can't find anything that executes within the class Series block that would plausibly matter for this.

Next try: define plot and hist within Series definition, but delay import of _gfx:

    @property
    def plot(self):
        import pandas.plotting._core as _gfx  # noqa
        return _gfx.SeriesPlotMethods(self)

    @property
    def hist(self, by=None, ax=None, grid=True, xlabelsize=None,
                xrot=None, ylabelsize=None, yrot=None, figsize=None,
                bins=10, **kwds):
        import pandas.plotting._core as _gfx  # noqa
        return _gfx.hist_series(self, by=by, ax=ax, grid=grid, xlabelsize=xlabelsize,
                xrot=xrot, ylabelsize=ylabelsize, yrot=yrot, figsize=figsize,
                bins=bins, **kwds)

OK! Now the tests pass (at least the ones from tests.plotting.test_series, still need to run the rest), and all we've lost is the docstrings. Moving the import of _gfx up still causes those 9 tests to fail. Best guess at this point is that something in the class Series block is causing core.groupby to be imported, which checks hasattr(Series, name) before deciding whether to define (with exec) a new method. That story doesn't quite hold together, but it's all I've got.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants