Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test failures with Pandas 1.2.0 #157

Merged
merged 8 commits into from
Jan 6, 2021

Conversation

BryanCutler
Copy link
Member

@BryanCutler BryanCutler commented Dec 29, 2020

There are a number of test failures when using Pandas 1.2.0.

  • The method pandas.core.ops._get_op_name() has been removed in Pandas >=1.2.0. This was being used when building TensorArray ops, and is replaced with similar usage in ExtensionScalarOpsMixin.
  • TensorArray should return it's instance when astype() called with same type
  • TensorArray._contains_ needs to have proper implementation of any()
  • repr() is broken when using float values of ndims > 2

Fixes #158

@BryanCutler
Copy link
Member Author

BryanCutler commented Dec 29, 2020

There seem to be a number of different issue with Pandas 1.2.0, I'll look into fixing them here too.

@frreiss
Copy link
Member

frreiss commented Jan 4, 2021

@BryanCutler any idea what's causing that test failure on the Python 3.7 build (https://github.com/CODAIT/text-extensions-for-pandas/pull/157/checks?check_run_id=1620791768)?

@BryanCutler BryanCutler mentioned this pull request Jan 4, 2021
@BryanCutler
Copy link
Member Author

I fixed most of the issues with pandas 1.2.0 and TensorArray, except there is a regression with repr() with floats, similar to #151. There has been some back and forth with my fix upstream for that one, so maybe I can fix both. Unfortunately, I don't know of a workaround, so we either have to leave it as a known issue or cap the supported pandas version < 1.2.0. I'm looking into the other issues now.

@BryanCutler BryanCutler changed the title Remove usage of private pandas method _get_op_name() Fix test failures with Pandas 1.2.0 Jan 4, 2021
@@ -445,6 +446,8 @@ def test_make_exploded_df(self):
15 Total tax rate \
""")

@pytest.mark.skipif(LooseVersion(pd.__version__) >= LooseVersion("1.2.0"),
reason="TODO: Rank col gets converted to float")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is different now, but I think it's safe to skip and I can make an issue to follow up with later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

@BryanCutler
Copy link
Member Author

@frreiss this failure https://github.com/CODAIT/text-extensions-for-pandas/pull/157/checks?check_run_id=1646927381#step:5:1438 in bert.align_bert_tokens_to_corpus_tokens() seems a bit strange. Not sure why this would suddenly change, unless they added some optimization that wasn't complete. I can look into it further, but letting you know in case you have some idea what's going on.

@BryanCutler
Copy link
Member Author

Ok, I think I found why the above error is happening and looks to be a bug. Pandas attempts to do a cython aggregation and then fallback to a vanilla agg if there is an error, but checks the error message improperly for this case. I should be able to file a bug report and do a PR for the fix.

@frreiss
Copy link
Member

frreiss commented Jan 5, 2021

Thanks for tracking down the root cause of that problem, @BryanCutler !

@BryanCutler
Copy link
Member Author

PR for the agg fix pandas-dev/pandas#38982

@BryanCutler
Copy link
Member Author

@frreiss I capped the upper version of Pandas to < 1.2.0 because not being able to display a Series with TensorArray of floats is pretty major and I don't see a workaround. I'll keep working on the upstream fix, but what are your thoughts on merging this?

@frreiss
Copy link
Member

frreiss commented Jan 6, 2021

I think merging is the best way forward for now. We may want to post a new release just to get that pandas<1.2.0 constraint into our requirements.txt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Pandas version 1.2.0
2 participants