Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Handle Datetimelike data in DataFrame.combine #23317

Merged
merged 2 commits into from
Oct 26, 2018

Conversation

TomAugspurger
Copy link
Contributor

Closes #23079

@TomAugspurger TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Oct 24, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Oct 24, 2018
@pep8speaks
Copy link

Hello @TomAugspurger! Thanks for submitting the PR.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Oct 24, 2018

After this, we'll need an additional change on the PeriodArray PR, so that the integers are converted back to periods:

commit f73f83e2878e04d266d8f86c2675d6c1fae6d90e
Author: Tom Augspurger <tom.w.augspurger@gmail.com>
Date:   Wed Oct 24 11:28:59 2018 -0500

    TST: Unxfail combine

diff --git a/pandas/core/dtypes/cast.py b/pandas/core/dtypes/cast.py
index 56bf39472..46c8126f6 100644
--- a/pandas/core/dtypes/cast.py
+++ b/pandas/core/dtypes/cast.py
@@ -6,7 +6,7 @@ import numpy as np
 import warnings
 
 from pandas._libs import tslib, lib, tslibs
-from pandas._libs.tslibs import iNaT, OutOfBoundsDatetime
+from pandas._libs.tslibs import iNaT, OutOfBoundsDatetime, Period
 from pandas.compat import string_types, text_type, PY3
 from .common import (ensure_object, is_bool, is_integer, is_float,
                      is_complex, is_datetimetz, is_categorical_dtype,
@@ -164,6 +164,12 @@ def maybe_downcast_to_dtype(result, dtype):
                     result = to_datetime(result).tz_localize('utc')
                     result = result.tz_convert(dtype.tz)
 
+        elif dtype.type == Period:
+            # TODO(DatetimeArray): merge with previous elif
+            from pandas.core.arrays import PeriodArray
+
+            return PeriodArray(result, freq=dtype.freq)
+
     except Exception:
         pass
 
diff --git a/pandas/tests/frame/test_combine_concat.py b/pandas/tests/frame/test_combine_concat.py
index 2803db4f4..3b8d6e6c5 100644
--- a/pandas/tests/frame/test_combine_concat.py
+++ b/pandas/tests/frame/test_combine_concat.py
@@ -759,7 +759,6 @@ class TestDataFrameCombineFirst(TestData):
         tm.assert_frame_equal(res, exp)
         assert res['TD'].dtype == 'timedelta64[ns]'
 
-    @pytest.mark.xfail(reason="GH-23079", strict=True)
     def test_combine_first_period(self):
         data1 = pd.PeriodIndex(['2011-01', 'NaT', '2011-03',
                                 '2011-04'], freq='M')

@codecov
Copy link

codecov bot commented Oct 24, 2018

Codecov Report

Merging #23317 into master will decrease coverage by <.01%.
The diff coverage is 93.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23317      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         169      169              
  Lines       50911    50914       +3     
==========================================
+ Hits        46954    46956       +2     
- Misses       3957     3958       +1
Flag Coverage Δ
#multiple 90.65% <93.33%> (-0.01%) ⬇️
#single 42.29% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 96.98% <93.33%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c26375...b659913. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

@jreback do you have a chance to look at this? It's one of the blockers for PeriodArray.

Originally, this was done as part of
#3595. I think the conversion of i8 needs to happen inside the combiner function, not outside of it.

@TomAugspurger
Copy link
Contributor Author

Actually... one sec, this may not be a blocker.

@TomAugspurger
Copy link
Contributor Author

Oh, yes, it is kinda a blocker, but I mis-diagnosed the real issue. The real thing we need on the PeriodArray PR is the change in maybe_downcast_to_dtype turning the i8 array back into a PeriodArray. But we'll still need this so that we handle the unpacking from series correctly.

So, I think this is still a valuable change on its own.

@jreback
Copy link
Contributor

jreback commented Oct 25, 2018

might be some more issues about this (which this may fix)

@TomAugspurger
Copy link
Contributor Author

Didn't see any other issues that this fixes. #7509 is still broken with this PR.

# 2. convert datelike to i8
if isinstance(arr, (ABCIndexClass, ABCSeries)):
arr = arr._values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super duper happy that we have to have coercions like this. @jbrockmendel I view combine like update, in that its a bit orphaned. We do most numerics right now by dispatching thru ops.py but a lot of things are in internals, and others are in generic. So we need a coherent strategy for this. but ok for now.

@jreback jreback merged commit 2ff7eec into pandas-dev:master Oct 26, 2018
@jreback
Copy link
Contributor

jreback commented Oct 26, 2018

thanks @TomAugspurger

thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018
…ndas

* repo_org/master: (23 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...
thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018
…xamples

* repo_org/master: (83 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants