Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch Pandas ExtensionArrayFormatter to handle ndims > 1 #172

Merged
merged 13 commits into from
Feb 12, 2021

Conversation

BryanCutler
Copy link
Member

Pandas formatting has been failing in TensorArray for datetime64 values and float values with ndims > 1. This patches the ExtensionArrayFormatter to handle higher dimension formatting until a proper fix exists upstream.

Closes #151

@BryanCutler
Copy link
Member Author

@frreiss I'm not crazy about patching like this, but since we can't display TensorArrays with float values of ndim > 1, that is a big limitation. I'm still working on the upstream fix, but waiting on that might prevent us from upgrading Pandas for a while. wdyt?

@BryanCutler
Copy link
Member Author

Still some kinks to work out with text spacing in the tests, I think..

@BryanCutler
Copy link
Member Author

I think I got all the kinks worked out. I had to add some special handling for pandas 1.0.x, but that can all be removed once we bump the minimum version.

Copy link
Member

@frreiss frreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. You're right that this sort of thing is to be avoided if possible, but I don't see how we an avoid this and still have compatibility with Pandas versions between 1.0 and 1.2.1.

The main risk is that the patched function might break some other code in the user's environment that doesn't use Text Extensions for Pandas at all. That type of crash would be a really nasty one to track down the root cause of.

Have you run the full regression suite of Pandas itself with this patch in place?

text_extensions_for_pandas/array/tensor.py Outdated Show resolved Hide resolved
@BryanCutler
Copy link
Member Author

@frreiss I added a check to use the original method if values aren't from a TensorArray, added the env var to switch it off completely, and added an upper bound to stop patching at Pandas 1.3,0+. I'm hopeful I'll be able to get a fix in by then :) If not we can just bump that version up. I'm going to go ahead and merge this and we can start using newer pandas versions.

0 [1 2]
1 [3 4]
2 [5 6]
0 [1, 2]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the leading space is because that option defaults to true, we might want to look into changing that for tensors at least

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2+ dimensional tensors of timestamps crash pd.Series.repr()
2 participants