[BUG] `Groupby.fillna` is not raising when a non-categorical value is passed #15666

galipremsagar · 2024-05-06T17:08:25Z

Describe the bug
When a scalar that is not present in categories is passed to Groupby.fillna we seem to be quietly passing instead of failing.

Steps/Code to reproduce bug

In [30]: import cudf

In [31]: s = cudf.Series(['a', 'b', 'c', 'f', 'ew', 'lk'], dtype='category')

In [32]: ps = s.to_pandas()

In [33]: s
Out[33]: 
0     a
1     b
2     c
3     f
4    ew
5    lk
dtype: category
Categories (6, object): ['a', 'b', 'c', 'ew', 'f', 'lk']

In [34]: ps
Out[34]: 
0     a
1     b
2     c
3     f
4    ew
5    lk
dtype: category
Categories (6, object): ['a', 'b', 'c', 'ew', 'f', 'lk']

In [35]: s.groupby(s).fillna(1)
/nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/cudf/core/groupby/groupby.py:2293: FutureWarning: groupby fillna is deprecated and will be removed in a future version. Use groupby ffill or groupby bfill for forward or backward filling instead.
  warnings.warn(
Out[35]: 
0     a
1     b
2     c
3     f
4    ew
5    lk
dtype: category
Categories (6, object): ['a', 'b', 'c', 'ew', 'f', 'lk']

In [36]: ps.groupby(ps).fillna(1)
<ipython-input-36-8ffdaad8b8ae>:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  ps.groupby(ps).fillna(1)
<ipython-input-36-8ffdaad8b8ae>:1: FutureWarning: SeriesGroupBy.fillna is deprecated and will be removed in a future version. Use obj.ffill() or obj.bfill() for forward or backward filling instead. If you want to fill with a single value, use Series.fillna instead
  ps.groupby(ps).fillna(1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 ps.groupby(ps).fillna(1)

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/groupby/generic.py:965, in SeriesGroupBy.fillna(self, value, method, axis, inplace, limit, downcast)
    887 """
    888 Fill NA/NaN values using the specified method within groups.
    889 
   (...)
    955 dtype: float64
    956 """
    957 warnings.warn(
    958     f"{type(self).__name__}.fillna is deprecated and "
    959     "will be removed in a future version. Use obj.ffill() or obj.bfill() "
   (...)
    963     stacklevel=find_stack_level(),
    964 )
--> 965 result = self._op_via_apply(
    966     "fillna",
    967     value=value,
    968     method=method,
    969     axis=axis,
    970     inplace=inplace,
    971     limit=limit,
    972     downcast=downcast,
    973 )
    974 return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:1425, in GroupBy._op_via_apply(self, name, *args, **kwargs)
   1422     return self._python_apply_general(curried, self._selected_obj)
   1424 is_transform = name in base.transformation_kernels
-> 1425 result = self._python_apply_general(
   1426     curried,
   1427     self._obj_with_exclusions,
   1428     is_transform=is_transform,
   1429     not_indexed_same=not is_transform,
   1430 )
   1432 if self._grouper.has_dropped_na and is_transform:
   1433     # result will have dropped rows due to nans, fill with null
   1434     # and ensure index is ordered same as the input
   1435     result = self._set_result_index_ordered(result)

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:1885, in GroupBy._python_apply_general(self, f, data, not_indexed_same, is_transform, is_agg)
   1850 @final
   1851 def _python_apply_general(
   1852     self,
   (...)
   1857     is_agg: bool = False,
   1858 ) -> NDFrameT:
   1859     """
   1860     Apply function f in python space
   1861 
   (...)
   1883         data after applying f
   1884     """
-> 1885     values, mutated = self._grouper.apply_groupwise(f, data, self.axis)
   1886     if not_indexed_same is None:
   1887         not_indexed_same = mutated

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/groupby/ops.py:919, in BaseGrouper.apply_groupwise(self, f, data, axis)
    917 # group might be modified
    918 group_axes = group.axes
--> 919 res = f(group)
    920 if not mutated and not _is_indexed_like(res, group_axes, axis):
    921     mutated = True

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:1413, in GroupBy._op_via_apply.<locals>.curried(x)
   1412 def curried(x):
-> 1413     return f(x, *args, **kwargs)

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/generic.py:7349, in NDFrame.fillna(self, value, method, axis, inplace, limit, downcast)
   7342     else:
   7343         raise TypeError(
   7344             '"value" parameter must be a scalar, dict '
   7345             "or Series, but you passed a "
   7346             f'"{type(value).__name__}"'
   7347         )
-> 7349     new_data = self._mgr.fillna(
   7350         value=value, limit=limit, inplace=inplace, downcast=downcast
   7351     )
   7353 elif isinstance(value, (dict, ABCSeries)):
   7354     if axis == 1:

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/internals/base.py:186, in DataManager.fillna(self, value, limit, inplace, downcast)
    182 if limit is not None:
    183     # Do this validation even if we go through one of the no-op paths
    184     limit = libalgos.validate_limit(None, limit=limit)
--> 186 return self.apply_with_block(
    187     "fillna",
    188     value=value,
    189     limit=limit,
    190     inplace=inplace,
    191     downcast=downcast,
    192     using_cow=using_copy_on_write(),
    193     already_warned=_AlreadyWarned(),
    194 )

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/internals/managers.py:363, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    361         applied = b.apply(f, **kwargs)
    362     else:
--> 363         applied = getattr(b, f)(**kwargs)
    364     result_blocks = extend_blocks(applied, result_blocks)
    366 out = type(self).from_blocks(result_blocks, self.axes)

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/internals/blocks.py:2334, in ExtensionBlock.fillna(self, value, limit, inplace, downcast, using_cow, already_warned)
   2331 except TypeError:
   2332     # 3rd party EA that has not implemented copy keyword yet
   2333     refs = None
-> 2334     new_values = self.values.fillna(value=value, method=None, limit=limit)
   2335     # issue the warning *after* retrying, in case the TypeError
   2336     #  was caused by an invalid fill_value
   2337     warnings.warn(
   2338         # GH#53278
   2339         "ExtensionArray.fillna added a 'copy' keyword in pandas "
   (...)
   2345         stacklevel=find_stack_level(),
   2346     )

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/arrays/_mixins.py:376, in NDArrayBackedExtensionArray.fillna(self, value, method, limit, copy)
    373 else:
    374     # We validate the fill_value even if there is nothing to fill
    375     if value is not None:
--> 376         self._validate_setitem_value(value)
    378     if not copy:
    379         new_values = self[:]

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/arrays/categorical.py:1589, in Categorical._validate_setitem_value(self, value)
   1587     return self._validate_listlike(value)
   1588 else:
-> 1589     return self._validate_scalar(value)

File /nvme/0/pgali/envs/cudfdev/lib/python3.11/site-packages/pandas/core/arrays/categorical.py:1614, in Categorical._validate_scalar(self, fill_value)
   1612     fill_value = self._unbox_scalar(fill_value)
   1613 else:
-> 1614     raise TypeError(
   1615         "Cannot setitem on a Categorical with a new "
   1616         f"category ({fill_value}), set the categories first"
   1617     ) from None
   1618 return fill_value

TypeError: Cannot setitem on a Categorical with a new category (1), set the categories first

Expected behavior
Match pandas error.

The text was updated successfully, but these errors were encountered:

Fixes: #15666 This PR validates values passed to `fillna` even if there are no null values in a categorical column. Forks from #14534 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #15683

galipremsagar added bug Something isn't working cudf.pandas Issues specific to cudf.pandas labels May 6, 2024

galipremsagar added this to the cudf.pandas API coverage milestone May 6, 2024

galipremsagar self-assigned this May 6, 2024

galipremsagar mentioned this issue May 7, 2024

Allow fillna to validate for CategoricalColumn.fillna #15683

Merged

3 tasks

rapids-bot bot closed this as completed in #15683 May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `Groupby.fillna` is not raising when a non-categorical value is passed #15666

[BUG] `Groupby.fillna` is not raising when a non-categorical value is passed #15666

galipremsagar commented May 6, 2024

[BUG] Groupby.fillna is not raising when a non-categorical value is passed #15666

[BUG] Groupby.fillna is not raising when a non-categorical value is passed #15666

Comments

galipremsagar commented May 6, 2024

[BUG] `Groupby.fillna` is not raising when a non-categorical value is passed #15666

[BUG] `Groupby.fillna` is not raising when a non-categorical value is passed #15666