Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metaclass approach #3

Conversation

TomAugspurger
Copy link

This is an alternative take to get metaclasses mostly working.

We continue to use ExtensionArray.len as what EA authors must
implement. We define size and shape in terms of it. We have some limited
handling of respecting previously defined shape and size. I'm working a bit
to make this more robust.

This removed ExtensionArray._shape in favor of a (name mangled) attribute defining
which dimension is "expanded" (None, 0, or 1). That seemed a bit safer than a ._shape
attribute.

cc @jbrockmendel.

This is an alternative take to get metaclasses mostly working.

We continue to use ExtensionArray.__len__ as what EA authors must
impelment. We define size and shape in terms of it. We have some limited
handling of respecting previously defined shape and size.
@TomAugspurger TomAugspurger changed the title Metaclass approach (WIP) Metaclass approach Aug 14, 2019
@TomAugspurger
Copy link
Author

Added a deprecation warning for subclasses defining shape or size.

Thoughts on how to proceed? Does this seem reasonable enough to merge into your PR branch?

@jbrockmendel
Copy link
Owner

I'll review this later today. Tentatively looks reasonable.

elif self._ExtensionArray__expanded_dim == 0:
result = length
else:
result = 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure i follow. is expanded_dim an indicator or is it a patched ndim or something else? Is 1 hard-coded here because we support only (N, 1) and (1, N)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expanded_dim is an indicator.

If array._expanded_dim == 1, that means array.shape is (1, N) and len(array) is 1.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i see this is clarified below)

@jbrockmendel
Copy link
Owner

The metaclass part of this makes sense. The changes to the size/len/shape interface I'm not sure about. I've pulled it locally and will poke at it a bit, see if I can break it by writing a really circular EA subclass.

It looks like this approach involves telling downstream authors "don't define size or shape unless you implement full-2D support". Am I reading this correctly?

My inclination is to tell authors "implement size directly, not in terms of __len__ or shape". I think that leads to the cleanest code, but may be harder to communicate/enforce?

If I were to merge this branch, does this reach the zero goal for affect on downstream authors?

@TomAugspurger
Copy link
Author

It looks like this approach involves telling downstream authors "don't define size or shape unless you implement full-2D support". Am I reading this correctly?

Yes.

My inclination is to tell authors "implement size directly, not in terms of len or shape". I think that leads to the cleanest code, but may be harder to communicate/enforce?

Do you think they'll need to implement size at all? Previously, it wasn't part of the EA interface.

If I were to merge this branch, does this reach the zero goal for affect on downstream authors?

In theory, yes. They may have a (non-visible) DepreciationWarning about removing custom .size and .shape methods. And there's of course the potential for bugs.

@TomAugspurger
Copy link
Author

TomAugspurger commented Aug 15, 2019

The changes to the size/len/shape interface

Just to be clear on this point

  1. On master (and in this PR) we require that subclasses specify __len__
  2. size wasn't previously part of the interface

@jbrockmendel
Copy link
Owner

I've pulled it locally and will poke at it a bit, see if I can break it by writing a really circular EA subclass.

If in IntervalArray I define:

@property
def shape(self):
    return (len(self.left),)

def __len__(self):
    return self.shape[0]

I get a bunch of recursion errors in extension/array tests. Is there anything in the interface spec that would prevent a downstream author from doing this?

@TomAugspurger
Copy link
Author

Hmm, no I don't think so... I'll think about this a bit.

@jbrockmendel
Copy link
Owner

Do you think they'll need to implement size at all? Previously, it wasn't part of the EA interface.

The reason I landed on size as what we ask authors to implement is because that's the one I can't imagine needing to override.

@TomAugspurger
Copy link
Author

TomAugspurger commented Aug 15, 2019 via email

@jbrockmendel
Copy link
Owner

Yah I don't think the problematic cases are going to be common, but I'm not comfortable ruling them out completely.

@TomAugspurger
Copy link
Author

cc @jorisvandenbossche & @jreback

Quick summary: This PR to Brock's PR adds a metaclass. Downstream EA authors don't have to do anything assuming that they define __len__ and they don't define a shape or size that internally calls __len__.


So I think we can't be 100% sure we'll not break downstream EA authors with this. It's not hard to come up with cases where shape, size, and len can be defined in terms of each other. But is this a problem in practice?

  • Fletcher (len, size): OK (size isn't defined in terms of len)
  • Cyberpandas (len, shape): OK (shape isn't defined in terms of len)
  • Geopandas (len, size): OK (size isn't defined in terms of len)
  • pint-pandas (len): OK

Are there any others we're aware of?

If this isn't actually a problem in practice, then I'm OK with ignoring the downsides. I would like to

  1. Add a DeprecationWarning (and extension tests) to 0.25.1 ensuring that subclasses don't override size and shape
  2. Proceed with this metaclass approach (where we still have EA authors define __len__) in 1.0. It's a bit heavy-handed to not allow subclasses to define shape and size, but I think it's reasonable.

WDYT?

@jorisvandenbossche
Copy link

I think the restrictions around the definitions of shape, len, size seem OK (assuming that's the only actual impact on existing EAs). So from that point of view, this seems a good approach.

That said, I am personally not a huge fan of a metaclass that so profoundly alters the behaviour of a class. This is rather un-transparent to the developer of an ExtensionArray, IMO (also in pandas).

@TomAugspurger
Copy link
Author

Yeah, I'm growing less fond of this approach as I start to appreciate how invasive it is.

@jbrockmendel
Copy link
Owner

@jreback I'd like to draw your attention back here before long. Recap:

The "base" PR is pandas-dev#27142. This is Tom's proof of concept to do the same thing using a metaclass approach. We eventually figured out that the metaclass approach has two problems:

  1. downstream authors can define __len__ and shape such that the metaclass induces RecursionErrors. AFAICT there is no way around this.
  2. downstream authors will be unable to use their own metaclasses

Problem 1) here is going to apply to pretty much any live-patching approach we try to take. I think we need to change the EA interface to:

  • remove __len__
  • require authors to implement size
    • require that size not rely on __len__ or shape

Yah it would be nice if this were unnecessary, but this is a "rip the band-aid off" situation.

@jreback
Copy link

jreback commented Sep 8, 2019

yeah i briefly reviewed this. I am on-board generally with this approach. As commented on the original PR, this would be more easily grok-able if the override methods were a bit more modularly defined (as free functions), and then simply called (as opposed to being implemented at the call site).

@jbrockmendel
Copy link
Owner

@jreback the relevant question ATM is not how to patch/override methods, but how to avoid potential circularity: #3 (comment)

@jorisvandenbossche
Copy link

I don't think the __len__ / shape issue you mention is a big problem.

But its not really clear to me if you still want to pursue the metaclass approach? For me the bigger issue with this is what I mentioned above in #3 (comment) about complexity / non-transparency of what is happening with your EA.

And if I understand Jeff's comment on pandas-dev#27142 (comment) correctly, that is for a non-metaclass approach, with which you see to agree?

@github-actions
Copy link

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Mar 31, 2021
@jbrockmendel jbrockmendel deleted the branch jbrockmendel:arrcompat November 20, 2021 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants