Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fixed .str.contains(..., na=False) for categorical series #22170

Merged
merged 36 commits into from
Nov 20, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
077d136
BUG: fixed .str.contains(..., na=False) for categorical series
pulkitmaloo Aug 2, 2018
2ae44d1
na argument for _wrap_results
pulkitmaloo Aug 2, 2018
a1b3d7b
fixed str.contains for missing values
pulkitmaloo Aug 17, 2018
dbd990b
PEP8 Issue: added whitespace after ','
pulkitmaloo Aug 17, 2018
9d5d2c2
Merge branch 'master' of https://github.com/pulkitmaloo/pandas into c…
pulkitmaloo Aug 24, 2018
90aef7b
na argument for wrap_results in match
pulkitmaloo Aug 24, 2018
69f16af
BUG: fixed .str.contains(..., na=False) for categorical series
pulkitmaloo Aug 2, 2018
78cf8c7
na argument for _wrap_results
pulkitmaloo Aug 2, 2018
93bb24a
fixed str.contains for missing values
pulkitmaloo Aug 17, 2018
6c2700f
PEP8 Issue: added whitespace after ','
pulkitmaloo Aug 17, 2018
6649129
na argument for wrap_results in match
pulkitmaloo Aug 24, 2018
5c87e81
Merge branch 'categorical_bug' of https://github.com/pulkitmaloo/pand…
pulkitmaloo Aug 24, 2018
a09dcc5
Merge branch 'master' into categorical_bug
pulkitmaloo Aug 25, 2018
3abdea5
Update circle-27-compat.yaml
pulkitmaloo Sep 22, 2018
d136599
Update travis-27-locale.yaml
pulkitmaloo Sep 22, 2018
b942e16
Merge branch 'categorical_bug' of https://github.com/pulkitmaloo/pand…
pulkitmaloo Sep 22, 2018
82f9b9e
added tests for na arg for categorical and objects
pulkitmaloo Sep 22, 2018
53e9253
updated _wrap_results with arg fill_value and removed na
pulkitmaloo Sep 23, 2018
9f0286f
merging
pulkitmaloo Sep 24, 2018
ffa9969
BUG: fixed .str.contains(..., na=False) for categorical series
pulkitmaloo Aug 2, 2018
07c1d73
na argument for _wrap_results
pulkitmaloo Aug 2, 2018
7f1f2e2
fixed str.contains for missing values
pulkitmaloo Aug 17, 2018
f6cb04f
PEP8 Issue: added whitespace after ','
pulkitmaloo Aug 17, 2018
1f0256a
na argument for wrap_results in match
pulkitmaloo Aug 24, 2018
7025f34
Update travis-27-locale.yaml
pulkitmaloo Sep 22, 2018
7542448
added tests for na arg for categorical and objects
pulkitmaloo Sep 22, 2018
f1b4274
updated _wrap_results with arg fill_value and removed na
pulkitmaloo Sep 23, 2018
386ab98
Merge branch 'categorical_bug' of https://github.com/pulkitmaloo/pand…
pulkitmaloo Sep 24, 2018
7a09c44
Update travis-27-locale.yaml
pulkitmaloo Sep 24, 2018
3408920
fixed line too long
pulkitmaloo Sep 24, 2018
3288d11
whatsnew note
pulkitmaloo Sep 25, 2018
6c87770
Update v0.24.0.txt
pulkitmaloo Sep 25, 2018
d242647
Update doc/source/whatsnew/v0.24.0.txt
TomAugspurger Oct 19, 2018
def1b4e
Merge branch 'master' into PR_TOOL_MERGE_PR_22170
jreback Nov 18, 2018
fd99431
whatsnew
jreback Nov 18, 2018
44b36a4
cleanup
jreback Nov 18, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1272,7 +1272,7 @@ Strings

- Bug in :meth:`Index.str.partition` was not nan-safe (:issue:`23558`).
- Bug in :meth:`Index.str.split` was not nan-safe (:issue:`23677`).
-
- Bug :func:`Series.str.contains` not respecting the ``na`` argument for a ``Categorical`` dtype ``Series`` (:issue:`22158`)

Interval
^^^^^^^^
Expand Down
9 changes: 5 additions & 4 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -1857,7 +1857,7 @@ def __iter__(self):
g = self.get(i)

def _wrap_result(self, result, use_codes=True,
name=None, expand=None):
name=None, expand=None, fill_value=np.nan):

from pandas.core.index import Index, MultiIndex

Expand All @@ -1867,7 +1867,8 @@ def _wrap_result(self, result, use_codes=True,
# so make it possible to skip this step as the method already did this
# before the transformation...
if use_codes and self._is_categorical:
result = take_1d(result, self._orig.cat.codes)
result = take_1d(result, self._orig.cat.codes,
fill_value=fill_value)

if not hasattr(result, 'ndim') or not hasattr(result, 'dtype'):
return result
Expand Down Expand Up @@ -2515,12 +2516,12 @@ def join(self, sep):
def contains(self, pat, case=True, flags=0, na=np.nan, regex=True):
result = str_contains(self._parent, pat, case=case, flags=flags, na=na,
regex=regex)
return self._wrap_result(result)
return self._wrap_result(result, fill_value=na)

@copy(str_match)
def match(self, pat, case=True, flags=0, na=np.nan):
result = str_match(self._parent, pat, case=case, flags=flags, na=na)
return self._wrap_result(result)
return self._wrap_result(result, fill_value=na)

@copy(str_replace)
def replace(self, pat, repl, n=-1, case=None, flags=0, regex=True):
Expand Down
30 changes: 24 additions & 6 deletions pandas/tests/test_strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -512,10 +512,28 @@ def test_contains(self):
assert result.dtype == np.bool_
tm.assert_numpy_array_equal(result, expected)

# na
values = Series(['om', 'foo', np.nan])
res = values.str.contains('foo', na="foo")
assert res.loc[2] == "foo"
def test_contains_for_object_category(self):
# gh 22158

# na for category
jreback marked this conversation as resolved.
Show resolved Hide resolved
values = Series(["a", "b", "c", "a", np.nan], dtype="category")
result = values.str.contains('a', na=True)
expected = Series([True, False, False, True, True])
tm.assert_series_equal(result, expected)

result = values.str.contains('a', na=False)
expected = Series([True, False, False, True, False])
tm.assert_series_equal(result, expected)

# na for objects
values = Series(["a", "b", "c", "a", np.nan])
result = values.str.contains('a', na=True)
expected = Series([True, False, False, True, True])
tm.assert_series_equal(result, expected)

result = values.str.contains('a', na=False)
expected = Series([True, False, False, True, False])
tm.assert_series_equal(result, expected)

def test_startswith(self):
values = Series(['om', NA, 'foo_nom', 'nom', 'bar_foo', NA, 'foo'])
Expand Down Expand Up @@ -2875,7 +2893,7 @@ def test_get_complex_nested(self, to_type):
expected = Series([np.nan])
tm.assert_series_equal(result, expected)

def test_more_contains(self):
def test_contains_moar(self):
# PR #1179
s = Series(['A', 'B', 'C', 'Aaba', 'Baca', '', NA,
'CABA', 'dog', 'cat'])
Expand Down Expand Up @@ -2925,7 +2943,7 @@ def test_contains_nan(self):
expected = Series([np.nan, np.nan, np.nan], dtype=np.object_)
assert_series_equal(result, expected)

def test_more_replace(self):
def test_replace_moar(self):
# PR #1179
s = Series(['A', 'B', 'C', 'Aaba', 'Baca', '', NA, 'CABA',
'dog', 'cat'])
Expand Down