Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid moving DictionaryChunks #2999

Merged
merged 1 commit into from
Mar 7, 2024
Merged

Avoid moving DictionaryChunks #2999

merged 1 commit into from
Mar 7, 2024

Conversation

benjaminwinger
Copy link
Collaborator

@benjaminwinger benjaminwinger commented Mar 6, 2024

There was a regression in #2994 which wasn't caught by our test suite. On some datasets it was causing a segfault when copying rel tables.

DictionaryChunk now stores pointers in its internal cache which would be invalidated when moving and can't be easily updated (which is understandable, changing the hash function and equals functions in an unordered_set usually would be expected to break things, even if in this case it would be safe).

They now store pointers in the internal cache which would be invalidated when moving and can't be easily updated
Copy link

codecov bot commented Mar 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.31%. Comparing base (9e23995) to head (ad24bf7).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2999   +/-   ##
=======================================
  Coverage   93.30%   93.31%           
=======================================
  Files        1124     1124           
  Lines       42913    42912    -1     
=======================================
+ Hits        40041    40043    +2     
+ Misses       2872     2869    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ray6080 ray6080 self-requested a review March 7, 2024 03:12
}

inline DictionaryChunk& getDictionaryChunk() { return dictionaryChunk; }
inline const DictionaryChunk& getDictionaryChunk() const { return dictionaryChunk; }
inline DictionaryChunk& getDictionaryChunk() { return *dictionaryChunk; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking to form a convention of naming functions returning non-const reference with suffix UnSafe (or maybe other suffix to be more explicit). What do you think? Is this necessary?

@ray6080
Copy link
Contributor

ray6080 commented Mar 7, 2024

Do we have a minimal example to reproduce this? so we can add to our test suite.

@ray6080
Copy link
Contributor

ray6080 commented Mar 7, 2024

@benjaminwinger I'm merging it so we can ship to for today's nightly build.

@ray6080 ray6080 merged commit 74c2f80 into master Mar 7, 2024
15 checks passed
@ray6080 ray6080 deleted the dictionary-memory-fix branch March 7, 2024 04:56
@benjaminwinger
Copy link
Collaborator Author

I'll see if I can come up with an example. It's caused by accessing strings after StringColumnChunk::finalize (where the move happens), which I think only occurs if we're copying multiple node groups, so maybe we don't have any tests where we copy multiple node groups of strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants