Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory_usage when calculating nested list column #16193

Merged
merged 2 commits into from
Jul 4, 2024

Conversation

mroeschke
Copy link
Contributor

Description

The offset column of a nested empty list column may be empty as discussed in #16164. ListColumn.memory_usage assumed that this column was non-empty

Unblocks rapidsai/cuspatial#1400

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added bug Something isn't working non-breaking Non-breaking change cuDF (Python) labels Jul 3, 2024
@mroeschke mroeschke requested a review from a team as a code owner July 3, 2024 17:47
@mroeschke mroeschke requested review from wence- and isVoid July 3, 2024 17:47
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jul 3, 2024
@@ -926,3 +927,29 @@ def test_list_iterate_error():
def test_list_struct_list_memory_usage():
df = cudf.DataFrame({"a": [[{"b": [1]}]]})
assert df.memory_usage().sum() == 16


def test_empty_nested_list_uninitialized_offsets_memory_usage():
Copy link
Contributor

@galipremsagar galipremsagar Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to re-write this test using a public API rather than accessing column constructors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried finding a public API (constructor) that would create an empty offset column, but I think the cudf Python layer is good at consistently ensuring the offset column has at least one entry.

There are operations that go through libcudf that will return an empty offset column (e.g. .iloc[0:0]), but I wanted to ensure we test something that has the least chance to change if a public API implementation changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@wence-
Copy link
Contributor

wence- commented Jul 4, 2024

/merge

@rapids-bot rapids-bot bot merged commit c1c62f1 into rapidsai:branch-24.08 Jul 4, 2024
79 checks passed
@mroeschke mroeschke deleted the bug/memory_usage/list branch July 5, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants