Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up index in-mem overflow file #2564

Merged
merged 1 commit into from
Dec 12, 2023
Merged

Clean up index in-mem overflow file #2564

merged 1 commit into from
Dec 12, 2023

Conversation

ray6080
Copy link
Contributor

@ray6080 ray6080 commented Dec 10, 2023

This PR cleans up a bunch of useless functions and designs such as nullmask inside InMemOverflowFile and InMemFile. They were originally used by Column and List. But those were all refactored to NodeGroup based implementation.

Also, merged these two classes together into InMemFile, as there is really no need for the separation of InMemFile and InMemOverflowFile . HashIndexBuilder is the only user now. So I simplified the code to be non-thread-safe, as the user is not requiring a thread-safe one.

Copy link

codecov bot commented Dec 10, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (8b7cffe) 92.90% compared to head (9e93a3d) 93.16%.
Report is 8 commits behind head on master.

Files Patch % Lines
src/storage/storage_structure/in_mem_file.cpp 84.61% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2564      +/-   ##
==========================================
+ Coverage   92.90%   93.16%   +0.25%     
==========================================
  Files        1026     1027       +1     
  Lines       38579    38480      -99     
==========================================
+ Hits        35842    35849       +7     
+ Misses       2737     2631     -106     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -59,147 +35,40 @@ ku_string_t InMemOverflowFile::appendString(const char* rawString) {
if (length > BufferPoolConstants::PAGE_4KB_SIZE) {
throw CopyException(ExceptionMessage::overLargeStringPKValueException(length));
}
std::unique_lock lck{lock};
// Allocate a new page if necessary.
if (nextOffsetInPageToAppend + length >= BufferPoolConstants::PAGE_4KB_SIZE) {
addANewPage();
nextOffsetInPageToAppend = 0;
nextPageIdxToAppend++;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we getting away with removing this lock? This seems like a race since addANewPage doesn't lock either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the argument that InMemOverflowFile is always accessed while holding a lock further up anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Only HashIndex is using this, and inside the overflow file, we don't do parallel insertions anymore. I should add a comment that this is no longer thread safe.

Copy link
Contributor

@Riolku Riolku Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We might do parallel insertions in the future though, at which point we would lock inside the index instead of the overflow file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would refactor all of these to something else different by then. I'm not sure which level should take the lock, but by the time, it will become clear. As for now, just clean up legacy first.

@Riolku
Copy link
Contributor

Riolku commented Dec 11, 2023

Also, can you add a summary in the commit message + PR description that explains the actual change, more in depth?

@ray6080
Copy link
Contributor Author

ray6080 commented Dec 11, 2023

Also, can you add a summary in the commit message + PR description that explains the actual change, more in depth?

Sure

Copy link
Contributor

@Riolku Riolku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea if this change has a performance benefit?

@ray6080
Copy link
Contributor Author

ray6080 commented Dec 11, 2023

Any idea if this change has a performance benefit?

It might have some for string insertions due to removing of a lock. But I cannot observe it through COPY NODE, as it is done in a serial way anyways, so really no contention on the removed lock there.

@ray6080 ray6080 merged commit de0302e into master Dec 12, 2023
14 checks passed
@ray6080 ray6080 deleted the hash-index branch December 12, 2023 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants