-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#7248: Make sure _validate_dtypes_sum_prod_mean
works correctly with datetime types
#7237
Conversation
…rks correctly with datetime types Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
_validate_dtypes_sum_prod_mean
_validate_dtypes_sum_prod_mean
works correctly with datetime types
# We cannot add datetime types, so if we are summing a column with | ||
# dtype datetime64 and cannot ignore non-numeric types, we must throw a | ||
# TypeError. | ||
if numeric_only is False and any( | ||
dtype == pandas.api.types.pandas_dtype("datetime64[ns]") | ||
for dtype in self.dtypes | ||
): | ||
raise TypeError( | ||
"'DatetimeArray' with dtype datetime64[ns] does not support reduction 'sum'" | ||
) | ||
|
||
data = self._get_numeric_data(axis) if numeric_only else self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code has been moved from _validate_dtypes_sum_prod_mean
function because the processing is different for sum function.
dtype != pandas.api.types.pandas_dtype("datetime64[ns]") | ||
and dtype != pandas.api.types.pandas_dtype("timedelta64[ns]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now these types cannot be compared with numbers.
# numeric_only is False because if it is None, it will default to True | ||
# if the operation fails with mixed dtypes. | ||
if ( | ||
axis | ||
and numeric_only is False | ||
and np.unique([is_numeric_dtype(dtype) for dtype in self.dtypes]).size == 2 | ||
and not all([is_numeric_dtype(dtype) for dtype in self.dtypes]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new way is more obvious.
modin_df, | ||
pandas_df, | ||
lambda df: df.min(axis=1), | ||
# pandas raises: `TypeError: '<=' not supported between instances of 'Timestamp' and 'int'` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents us from emitting the same error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely there are no fundamental restrictions, but we need to know better how pandas does this.
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
_validate_dtypes_sum_prod_mean
doesn't process correctlydatetime64[ns]
andtimedelta64[ns]
types #7248docs/development/architecture.rst
is up-to-date