Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD/CI: ResourceWarnings are back (sometimes)! #23680

Closed
h-vetinari opened this issue Nov 14, 2018 · 3 comments · Fixed by #23731
Closed

BLD/CI: ResourceWarnings are back (sometimes)! #23680

h-vetinari opened this issue Nov 14, 2018 · 3 comments · Fixed by #23731
Labels
Build Library building on various platforms CI Continuous Integration Docs Testing pandas testing functions or related to the test suite
Milestone

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Nov 14, 2018

In #22225, #23192 (and now #23582), I've had persistent a ResourceWarning the last few CI runs. I first thought it was a flaky thing like those warnings used to be, but this time, it stayed, and I can reproduce some of it locally (not with pytest pandas/tests/io/test_parquet.py, but at least with pytest pandas/tests/io).

For example in:

  1. https://travis-ci.org/pandas-dev/pandas/jobs/453820311
  2. https://travis-ci.org/pandas-dev/pandas/jobs/453822449
  3. https://travis-ci.org/pandas-dev/pandas/jobs/454102793
  4. https://travis-ci.org/pandas-dev/pandas/jobs/454342935
  5. https://travis-ci.org/pandas-dev/pandas/jobs/454644088
  6. https://travis-ci.org/pandas-dev/pandas/jobs/454644964
  7. https://travis-ci.org/pandas-dev/pandas/jobs/454744932
  8. https://travis-ci.org/pandas-dev/pandas/jobs/454760670
  9. https://travis-ci.org/pandas-dev/pandas/jobs/454760916
sys:1: ResourceWarning: unclosed <socket.socket fd=16, family=AddressFamily.AF_INET, type=2050, proto=0, laddr=('0.0.0.0', 0)>
sys:1: ResourceWarning: unclosed <socket.socket fd=15, family=AddressFamily.AF_INET, type=2050, proto=0, laddr=('0.0.0.0', 0)>

and

=============================== warnings summary ===============================
pandas/core/frame.py::pandas.core.frame.DataFrame.to_parquet
  /home/travis/build/pandas-dev/pandas/pandas/io/parquet.py:129: ResourceWarning: unclosed file <_io.BufferedReader name='df.parquet.gzip'>
    **kwargs).to_pandas()

There's also a stderr (or stdout) warning from the parser-tests surfacing somewhere:

..............................................................x...........................................................s....
........................................Skipping line 3: Expected 3 fields in line 3, saw 4
.......................................s.......................................................................................

I've narrowed one of the ResourceWarning down to the parquet-s3 tests, but at least one other one remains that I haven't been able to track (same for the skipped line warning). I couldn't grep anything about 'df.parquet.gzip' in various combinations, and tried disabling anything related to 'gzip' or S3 or _io in several trial runs in #23192, to no avail.

Any help would be appreciated. Would also be interested to hear if someone else has seen them already. The code in the PRs I linked on top cannot reasonably be the culprit (e.g. #23582 just adds tests)...

Potentially related xref: #22934

@h-vetinari
Copy link
Contributor Author

Actually, I was only grepping into the tests, but that may have been not creative enough...

Found the following in pandas/core/frame.py:

        Examples
        --------
        >>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
        >>> df.to_parquet('df.parquet.gzip', compression='gzip')
        >>> pd.read_parquet('df.parquet.gzip')
           col1  col2
        0     1     3
        1     2     4

@datapythonista Could it be that the ResourceWarning is coming form the doctests? This would make sense to me, because it always, always happens only in the travis-36 lint build.

@gfyoung gfyoung added Build Library building on various platforms Docs CI Continuous Integration labels Nov 14, 2018
@datapythonista datapythonista added the Testing pandas testing functions or related to the test suite label Nov 14, 2018
@datapythonista
Copy link
Member

@h-vetinari I'm not sure where that error comes from.

In #22854 I'm moving the doctests to azure, and after that I should be clear if the warning is generated in the doctests, as the logs for them will be displayed separate from the rest.

If the problem is in the code you show, that should be fixed in #23201, as we should skip those lines from doctests.

@h-vetinari
Copy link
Contributor Author

@datapythonista
I verified that there's at least three-four individual issues:

  • the ResourceWarnings from sys:1 with the open sockets (one of them should be from the parquet-s3 tests)
  • the ResourceWarning about 'df.parquet.gzip', which definitely comes from the doc test (disappeared when commenting out that example). The larger point is that this might be a resource leak in the parquet reader/writer, which isn't being caught anywhere else (and won't be solved by # doctest: +SKIP).
  • the warning bubbling into the test-output, which is solved by CLN/CI: Catch that stderr-warning! #23706

Generally speaking, I don't understand why ResourceWarnings are only caught with PANDAS_TESTING_MODE="deprecate" -- I get that DeprecationWarnings can normally be avoided, but ResourceWarnings shouldn't be filtered out by default, IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms CI Continuous Integration Docs Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants