Metaclass approach #3

TomAugspurger · 2019-08-14T19:22:48Z

This is an alternative take to get metaclasses mostly working.

We continue to use ExtensionArray.len as what EA authors must
implement. We define size and shape in terms of it. We have some limited
handling of respecting previously defined shape and size. I'm working a bit
to make this more robust.

This removed ExtensionArray._shape in favor of a (name mangled) attribute defining
which dimension is "expanded" (None, 0, or 1). That seemed a bit safer than a ._shape
attribute.

cc @jbrockmendel.

This is an alternative take to get metaclasses mostly working. We continue to use ExtensionArray.__len__ as what EA authors must impelment. We define size and shape in terms of it. We have some limited handling of respecting previously defined shape and size.

pandas/core/arrays/_reshaping.py

TomAugspurger · 2019-08-14T21:42:55Z

Added a deprecation warning for subclasses defining shape or size.

Thoughts on how to proceed? Does this seem reasonable enough to merge into your PR branch?

jbrockmendel · 2019-08-14T21:47:09Z

I'll review this later today. Tentatively looks reasonable.

jbrockmendel · 2019-08-15T14:56:59Z

pandas/core/arrays/_reshaping.py

+        elif self._ExtensionArray__expanded_dim == 0:
+            result = length
+        else:
+            result = 1


im not sure i follow. is expanded_dim an indicator or is it a patched ndim or something else? Is 1 hard-coded here because we support only (N, 1) and (1, N)?

expanded_dim is an indicator.

If array._expanded_dim == 1, that means array.shape is (1, N) and len(array) is 1.

(i see this is clarified below)

jbrockmendel · 2019-08-15T15:27:15Z

The metaclass part of this makes sense. The changes to the size/len/shape interface I'm not sure about. I've pulled it locally and will poke at it a bit, see if I can break it by writing a really circular EA subclass.

It looks like this approach involves telling downstream authors "don't define size or shape unless you implement full-2D support". Am I reading this correctly?

My inclination is to tell authors "implement size directly, not in terms of __len__ or shape". I think that leads to the cleanest code, but may be harder to communicate/enforce?

If I were to merge this branch, does this reach the zero goal for affect on downstream authors?

TomAugspurger · 2019-08-15T15:39:44Z

It looks like this approach involves telling downstream authors "don't define size or shape unless you implement full-2D support". Am I reading this correctly?

Yes.

My inclination is to tell authors "implement size directly, not in terms of len or shape". I think that leads to the cleanest code, but may be harder to communicate/enforce?

Do you think they'll need to implement size at all? Previously, it wasn't part of the EA interface.

If I were to merge this branch, does this reach the zero goal for affect on downstream authors?

In theory, yes. They may have a (non-visible) DepreciationWarning about removing custom .size and .shape methods. And there's of course the potential for bugs.

TomAugspurger · 2019-08-15T15:40:46Z

The changes to the size/len/shape interface

Just to be clear on this point

On master (and in this PR) we require that subclasses specify __len__
size wasn't previously part of the interface

jbrockmendel · 2019-08-15T15:52:42Z

I've pulled it locally and will poke at it a bit, see if I can break it by writing a really circular EA subclass.

If in IntervalArray I define:

@property
def shape(self):
    return (len(self.left),)

def __len__(self):
    return self.shape[0]

I get a bunch of recursion errors in extension/array tests. Is there anything in the interface spec that would prevent a downstream author from doing this?

TomAugspurger · 2019-08-15T15:55:31Z

Hmm, no I don't think so... I'll think about this a bit.

jbrockmendel · 2019-08-15T16:11:04Z

Do you think they'll need to implement size at all? Previously, it wasn't part of the EA interface.

The reason I landed on size as what we ask authors to implement is because that's the one I can't imagine needing to override.

TomAugspurger · 2019-08-15T16:21:17Z

Fletcher, at least, defines size: https://github.com/xhochy/fletcher/blob/1b68f0d7dcddf2289824a678a039959354a8cb0a/fletcher/base.py#L483 Though it's not in terms of .shape or __len__

…

On Thu, Aug 15, 2019 at 11:11 AM jbrockmendel ***@***.***> wrote: Do you think they'll need to implement size at all? Previously, it wasn't part of the EA interface. The reason I landed on size as what we ask authors to implement is because that's the one I can't imagine needing to override. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3?email_source=notifications&email_token=AAKAOIQAVCHC7BPWLYU5UNTQEV5ZRA5CNFSM4ILYDXZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4MH3NQ#issuecomment-521698742>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOISAOSF6HQSNTB2U2JDQEV5ZRANCNFSM4ILYDXZQ> .

jbrockmendel · 2019-08-15T17:06:53Z

Yah I don't think the problematic cases are going to be common, but I'm not comfortable ruling them out completely.

TomAugspurger · 2019-08-16T13:23:39Z

cc @jorisvandenbossche & @jreback

Quick summary: This PR to Brock's PR adds a metaclass. Downstream EA authors don't have to do anything assuming that they define __len__ and they don't define a shape or size that internally calls __len__.

So I think we can't be 100% sure we'll not break downstream EA authors with this. It's not hard to come up with cases where shape, size, and len can be defined in terms of each other. But is this a problem in practice?

Fletcher (len, size): OK (size isn't defined in terms of len)
Cyberpandas (len, shape): OK (shape isn't defined in terms of len)
Geopandas (len, size): OK (size isn't defined in terms of len)
pint-pandas (len): OK

Are there any others we're aware of?

If this isn't actually a problem in practice, then I'm OK with ignoring the downsides. I would like to

Add a DeprecationWarning (and extension tests) to 0.25.1 ensuring that subclasses don't override size and shape
Proceed with this metaclass approach (where we still have EA authors define __len__) in 1.0. It's a bit heavy-handed to not allow subclasses to define shape and size, but I think it's reasonable.

WDYT?

jorisvandenbossche · 2019-08-21T17:55:26Z

I think the restrictions around the definitions of shape, len, size seem OK (assuming that's the only actual impact on existing EAs). So from that point of view, this seems a good approach.

That said, I am personally not a huge fan of a metaclass that so profoundly alters the behaviour of a class. This is rather un-transparent to the developer of an ExtensionArray, IMO (also in pandas).

TomAugspurger · 2019-08-23T11:43:36Z

Yeah, I'm growing less fond of this approach as I start to appreciate how invasive it is.

jbrockmendel · 2019-09-05T01:43:53Z

@jreback I'd like to draw your attention back here before long. Recap:

The "base" PR is pandas-dev#27142. This is Tom's proof of concept to do the same thing using a metaclass approach. We eventually figured out that the metaclass approach has two problems:

downstream authors can define __len__ and shape such that the metaclass induces RecursionErrors. AFAICT there is no way around this.
downstream authors will be unable to use their own metaclasses

Problem 1) here is going to apply to pretty much any live-patching approach we try to take. I think we need to change the EA interface to:

remove __len__
require authors to implement size
- require that size not rely on __len__ or shape

Yah it would be nice if this were unnecessary, but this is a "rip the band-aid off" situation.

jreback · 2019-09-08T19:56:11Z

yeah i briefly reviewed this. I am on-board generally with this approach. As commented on the original PR, this would be more easily grok-able if the override methods were a bit more modularly defined (as free functions), and then simply called (as opposed to being implemented at the call site).

jbrockmendel · 2019-09-08T20:22:03Z

@jreback the relevant question ATM is not how to patch/override methods, but how to avoid potential circularity: #3 (comment)

jorisvandenbossche · 2019-09-10T20:46:44Z

I don't think the __len__ / shape issue you mention is a big problem.

But its not really clear to me if you still want to pursue the metaclass approach? For me the bigger issue with this is what I mentioned above in #3 (comment) about complexity / non-transparency of what is happening with your EA.

And if I understand Jeff's comment on pandas-dev#27142 (comment) correctly, that is for a non-metaclass approach, with which you see to agree?

pandas-dev#35522)

github-actions · 2021-03-31T02:27:49Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

Metaclass approach (WIP)

da0dc4f

This is an alternative take to get metaclasses mostly working. We continue to use ExtensionArray.__len__ as what EA authors must impelment. We define size and shape in terms of it. We have some limited handling of respecting previously defined shape and size.

TomAugspurger commented Aug 14, 2019

View reviewed changes

pandas/core/arrays/_reshaping.py Outdated Show resolved Hide resolved

TomAugspurger added 2 commits August 14, 2019 14:27

revert move

a4493fc

Add deprecation warning

a73bd77

TomAugspurger changed the title ~~Metaclass approach (WIP)~~ Metaclass approach Aug 14, 2019

jbrockmendel reviewed Aug 15, 2019

View reviewed changes

jbrockmendel mentioned this pull request Sep 8, 2019

EA: support basic 2D operations pandas-dev/pandas#27142

Closed

jbrockmendel mentioned this pull request Sep 11, 2019

EA: require size instead of __len__ pandas-dev/pandas#28389

Closed

jbrockmendel pushed a commit that referenced this pull request Aug 10, 2020

BUG: Fix assert_equal when check_exact=True for non-numeric dtypes #3… (

9a8152c

pandas-dev#35522)

github-actions bot added the Stale label Mar 31, 2021

jbrockmendel deleted the branch jbrockmendel:arrcompat November 20, 2021 23:23

jbrockmendel closed this Nov 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metaclass approach #3

Metaclass approach #3

TomAugspurger commented Aug 14, 2019

TomAugspurger commented Aug 14, 2019

jbrockmendel commented Aug 14, 2019

jbrockmendel Aug 15, 2019

TomAugspurger Aug 15, 2019

jbrockmendel Aug 15, 2019

jbrockmendel commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019 •

edited

Loading

jbrockmendel commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019

jbrockmendel commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019 via email

jbrockmendel commented Aug 15, 2019

TomAugspurger commented Aug 16, 2019

jorisvandenbossche commented Aug 21, 2019

TomAugspurger commented Aug 23, 2019

jbrockmendel commented Sep 5, 2019

jreback commented Sep 8, 2019

jbrockmendel commented Sep 8, 2019

jorisvandenbossche commented Sep 10, 2019

github-actions bot commented Mar 31, 2021

Metaclass approach #3

Metaclass approach #3

Conversation

TomAugspurger commented Aug 14, 2019

TomAugspurger commented Aug 14, 2019

jbrockmendel commented Aug 14, 2019

jbrockmendel Aug 15, 2019

Choose a reason for hiding this comment

TomAugspurger Aug 15, 2019

Choose a reason for hiding this comment

jbrockmendel Aug 15, 2019

Choose a reason for hiding this comment

jbrockmendel commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019 • edited Loading

jbrockmendel commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019

jbrockmendel commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019 via email

jbrockmendel commented Aug 15, 2019

TomAugspurger commented Aug 16, 2019

jorisvandenbossche commented Aug 21, 2019

TomAugspurger commented Aug 23, 2019

jbrockmendel commented Sep 5, 2019

jreback commented Sep 8, 2019

jbrockmendel commented Sep 8, 2019

jorisvandenbossche commented Sep 10, 2019

github-actions bot commented Mar 31, 2021

TomAugspurger commented Aug 15, 2019 •

edited

Loading