Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Separate out DataFrame/Series Construction Helpers #24100

Merged
merged 5 commits into from
Dec 5, 2018

Conversation

jbrockmendel
Copy link
Member

Why? In implementing #24096 I found it tough to tell all the paths by which a DatetimeIndex get passed to a DataFrame or Series. Collecting all these helper functions is a step towards reducing the number of paths available so these things can be caught in one place.

The main thing this PR does is move helper functions from core.series and core.frame. A few ancillary things it does

  • remove runtime imports (there is only one left in construction.py)
  • use ABC classes for isinstance checks
  • nested functions _try_cast and _get_axes are de-nested
  • de-privatize functions that are imported elsewhere. A function in construction.py has a leading underscore if and only if it is not imported elsewhere.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Dec 5, 2018

Codecov Report

Merging #24100 into master will decrease coverage by 49.67%.
The diff coverage is 62.79%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #24100       +/-   ##
===========================================
- Coverage   92.21%   42.53%   -49.68%     
===========================================
  Files         161      162        +1     
  Lines       51684    51713       +29     
===========================================
- Hits        47658    21998    -25660     
- Misses       4026    29715    +25689
Flag Coverage Δ
#multiple ?
#single 42.53% <62.79%> (-0.47%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 48.39% <100%> (-45.29%) ⬇️
pandas/core/arrays/categorical.py 41.89% <100%> (-53.52%) ⬇️
pandas/core/arrays/sparse.py 44.84% <100%> (-47.12%) ⬇️
pandas/core/frame.py 35.92% <50%> (-61.11%) ⬇️
pandas/core/internals/construction.py 62.59% <62.59%> (ø)
pandas/core/sparse/frame.py 27.46% <66.66%> (-67.38%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
... and 125 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b5f4d1...4f0e7f4. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 5, 2018

Codecov Report

Merging #24100 into master will decrease coverage by <.01%.
The diff coverage is 96.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24100      +/-   ##
==========================================
- Coverage   92.21%    92.2%   -0.01%     
==========================================
  Files         161      162       +1     
  Lines       51684    51700      +16     
==========================================
+ Hits        47658    47672      +14     
- Misses       4026     4028       +2
Flag Coverage Δ
#multiple 90.6% <96.44%> (-0.01%) ⬇️
#single 43.02% <64.69%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/sparse/frame.py 94.84% <100%> (+0.01%) ⬆️
pandas/core/series.py 93.69% <100%> (+0.01%) ⬆️
pandas/core/arrays/categorical.py 95.4% <100%> (ø) ⬆️
pandas/core/arrays/sparse.py 91.95% <100%> (ø) ⬆️
pandas/core/frame.py 96.79% <91.66%> (-0.23%) ⬇️
pandas/core/internals/construction.py 96.62% <96.62%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b5f4d1...4f0e7f4. Read the comment docs.

@jreback jreback added Internals Related to non-user accessible pandas implementation Clean labels Dec 5, 2018
@jreback jreback added this to the 0.24.0 milestone Dec 5, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can be done in a followup along with renaming the functions to be more consistent with what they do, e.g. init_ndarray, should really be: create_block_manager_from_ndarray (though we have a slight conflict on that name). but for sure need to separate out public / private things.


return create_block_manager_from_blocks([values], [columns, index])
return init_ndarray(values, index, columns, dtype=dtype, copy=copy)
# TODO: can we just get rid of this as a method?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i would just call these directly (your new routines)

arrays = [data[k] for k in keys]

return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
return init_dict(data, index, columns, dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as below, yes)

@@ -0,0 +1,699 @@
"""
Functions for preparing various inputs passed to the DataFrame or Series
constructors before passing them to aBlockManager.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space before BlockManager

@jreback jreback merged commit 669cb27 into pandas-dev:master Dec 5, 2018
@jreback
Copy link
Contributor

jreback commented Dec 5, 2018

thanks. prob should open an issue with a todo list for followups.

@jbrockmendel jbrockmendel deleted the construct branch December 5, 2018 14:46
jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Dec 5, 2018
jreback pushed a commit that referenced this pull request Dec 5, 2018
TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Dec 6, 2018
commit 28c61d770f6dfca6857fd0fa6979d4119a31129e
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 12:18:19 2018 -0600

    uncomment

commit bae2e322523efc73a1344464f51611e2dc555ccb
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 12:17:09 2018 -0600

    maybe fixes

commit 6cb4db05c9d6ceba3794096f0172cae5ed5f6019
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 09:57:37 2018 -0600

    we back

commit d97ab57fb32cb23371169d9ed659ccfac34cfe45
Merge: a117de4 b78aa8d
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Thu Dec 6 09:51:51 2018 -0600

    Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased2

commit b78aa8d
Author: gfyoung <gfyoung17+GitHub@gmail.com>
Date:   Thu Dec 6 07:18:44 2018 -0500

    REF/TST: Add pytest idiom to reshape/test_tile (pandas-dev#24107)

commit 2993b8e
Author: gfyoung <gfyoung17+GitHub@gmail.com>
Date:   Thu Dec 6 07:17:55 2018 -0500

    REF/TST: Add more pytest idiom to scalar/test_nat (pandas-dev#24120)

commit b841374
Author: evangelineliu <hsiyinliu@gmail.com>
Date:   Wed Dec 5 18:21:46 2018 -0500

    BUG: Fix concat series loss of timezone (pandas-dev#24027)

commit 4ae63aa
Author: jbrockmendel <jbrockmendel@gmail.com>
Date:   Wed Dec 5 14:44:50 2018 -0800

    Implement DatetimeArray._from_sequence (pandas-dev#24074)

commit 2643721
Author: jbrockmendel <jbrockmendel@gmail.com>
Date:   Wed Dec 5 14:43:45 2018 -0800

    CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116)

commit 8ea7744
Author: chris-b1 <cbartak@gmail.com>
Date:   Wed Dec 5 14:21:23 2018 -0600

    PERF: ascii c string functions (pandas-dev#23981)

commit cb862e4
Author: jbrockmendel <jbrockmendel@gmail.com>
Date:   Wed Dec 5 12:19:46 2018 -0800

    BUG: fix mutation of DTI backing Series/DataFrame (pandas-dev#24096)

commit aead29b
Author: topper-123 <contribute@tensortable.com>
Date:   Wed Dec 5 19:06:00 2018 +0000

    API: rename MultiIndex.labels to MultiIndex.codes (pandas-dev#23752)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants