Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArrayManager] Add SingleArrayManager to back a Series #40152

Merged
merged 21 commits into from
Mar 5, 2021

Conversation

jorisvandenbossche
Copy link
Member

xref #39146 (comment)

This implements a SingleArrayManager class for backing a Series, consistent with how we have a SingleBlockManager.

cc @jreback @jbrockmendel

@jorisvandenbossche jorisvandenbossche added Refactor Internal refactoring of code Internals Related to non-user accessible pandas implementation labels Mar 1, 2021

def astype_array(values, dtype, copy):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added code here (this function and the one below) is only temporary until #40141 is merged (that PR is moving the Block.astype implementation so it can be reused here)

@@ -98,3 +98,9 @@ def equals(self, other: object) -> bool:
return False

return self._equal_values(other)


class SingleManager(DataManager):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added this class to have a common subclass for SingleBlockManager and SingleArrayManager.

But in the end, the only way that it is actually being used are a few places that do isinstance(mgr, SingleManager), while I could also replace those with isinstance(mgr, (SingleBlockManager, SingleArrayManager)) instead of having this additional class injected in the class hierarchy.

Comment on lines +310 to +314
manager = get_option("mode.data_manager")
if manager == "block":
data = SingleBlockManager.from_array(data, index)
elif manager == "array":
data = SingleArrayManager.from_array(data, index)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I have below another time, so could move it to a helper function to deduplicate

Comment on lines 680 to 683
arr = self._mgr.array
if isinstance(arr, np.ndarray):
arr = PandasArray(arr)
return arr
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also move this to SingleArrayManager and add a array_values() method there (for SingleBlockManager this lives on the Block, not the Manager)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to just have a SingleManager class that works for block / array. a SingleBlockManager after all is a single array.

if not take_split_path and self.obj._mgr.blocks and self.ndim > 1:
if (
not take_split_path
and getattr(self.obj._mgr, "blocks", False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally shouldn't reach into the mgr here (followon)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I am planning to do some PRs to move some of the block-custom logic from indexing.py to the internals (but it's quite complex code ..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I am planning to do some PRs to move some of the block-custom logic from indexing.py to the internals (but it's quite complex code ..)

ive been looking at this for a while; it (at least my preferred approach) is blocked by #39510 and #39163

@jorisvandenbossche
Copy link
Member Author

is there a way to just have a SingleManager class that works for block / array. a SingleBlockManager after all is a single array.

I don't think so, AFAIK. A Block and an array have a quite different API (also, the SingleArrayManager is inheriting quite a bit from ArrayManager, so it's more different from SingleBlockManager than just the code added in the SingleArrayManager class here).

There certainly might be some trivial properties that could be shared (like .index, and .dtype if if we add a .array to SingleBlockManager that is a shortcut to the Block's values) to reduce the size of the SingleArrayManager a bit. But note that right now it's not even 150 lines of code.

slice(len(vslider.buf)))
if self.has_block:
object.__setattr__(cached_typ._mgr._block, 'values', vslider.buf)
object.__setattr__(cached_typ._mgr._block, 'mgr_locs',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is being done separately in #40199

@jorisvandenbossche
Copy link
Member Author

@jreback are you OK with this approach (see my response at #40152 (comment))

@jreback jreback added this to the 1.3 milestone Mar 5, 2021
@@ -98,3 +98,7 @@ def equals(self, other: object) -> bool:
return False

return self._equal_values(other)


class SingleDataManager(DataManager):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny!

@jreback jreback merged commit 529adea into pandas-dev:master Mar 5, 2021
@jreback
Copy link
Contributor

jreback commented Mar 5, 2021

thanks @jorisvandenbossche

ok generally ok with adding code, but more code == more complexity, so ultimately we want to simplify as much as possible. of course we are in the middle of new development on array manager so ok for now. but we want to consolidate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants