Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy) #59376

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Jul 31, 2024

We allowed to set "pyarrow_numpy" as a value to the mode.string_storage option.

I assume this was mostly done for testing purposes, because we explicitly documented that this option would be ignored when enabling the future.infer_string option (i.e. when "pyarrow_numpy" is being used by default).

In context of the new string dtype (PDEP-14), I think we should remove that option again (since we will generally deprecate/remove "pyarrow_numpy"). Given the above (not documented), I opted for directly removing it from the option instead of first deprecating it (for the StringDtype constructor / string alias, we will first deprecate it). But I did add a clarification to the option error message in case someone actually did use it.

xref #54792

@jorisvandenbossche jorisvandenbossche added the Strings String extension data type and string data label Jul 31, 2024
@@ -1305,7 +1304,6 @@ def string_storage(request):

* 'python'
* 'pyarrow'
* 'pyarrow_numpy'
Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the string_storage fixture will no longer cover the "pyarrow + NaN" variant of the string dtype directly with this change, but:

  • Most tests using this fixture are inside the pandas/tests/io submodule, which had an override of this fixture to not include "pyarrow_numpy" anyway (and I removed this override)
  • I checked/updated the other tests and none of them really need this.

Comment on lines +463 to +468
if value == "pyarrow_numpy":
# TODO: we can remove extra message after 3.0
msg += (
". 'pyarrow_numpy' was specified, but this option should be "
"enabled using pandas.options.future.infer_string instead"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this custom message to the error in case someone actually did end up using pd.options.mode.string_storage = "pyarrow_numpy" to point them to the right option to enable.

Comment on lines +903 to +904
def test_astype_to_string_dtype_not_modifying_input(any_string_dtype, val):
# GH#51073 - variant of the above test with explicit dtype instances
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this variant because the test above with the string_storage option and astype("string") will now only test the NA-variants of the string dtype.

with cf.config_prefix("mode"):
cf.register_option(
"string_storage",
"python",
string_storage_doc,
validator=is_one_of_factory(["python", "pyarrow", "pyarrow_numpy"]),
# validator=is_one_of_factory(["python", "pyarrow"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# validator=is_one_of_factory(["python", "pyarrow"]),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to leave this line in so it's easier to go back to that, given that this extra message is only meant to stay for 2.3, and for 3.0 we should already remove it again (because the infer_string option will be enabled by default in 3.0, so then there is no point to let users set that manually)

@mroeschke mroeschke added this to the 3.0 milestone Aug 1, 2024
@mroeschke mroeschke merged commit 6adba55 into pandas-dev:main Aug 1, 2024
39 of 45 checks passed
@mroeschke
Copy link
Member

Thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the string-dtype-options-string_storage branch August 1, 2024 16:40
WillAyd pushed a commit that referenced this pull request Aug 13, 2024
…(remove pyarrow_numpy) (#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 14, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
@jorisvandenbossche jorisvandenbossche modified the milestones: 3.0, 2.3 Aug 20, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 21, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 27, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Sep 20, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 3, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 7, 2024
…(remove pyarrow_numpy) (pandas-dev#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
jorisvandenbossche added a commit that referenced this pull request Oct 9, 2024
…(remove pyarrow_numpy) (#59376)

* String dtype: restrict options.mode.string_storage to python|pyarrow (remove pyarrow_numpy)

* add type annotation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants