Refactor: unify many_one and many_many storage #2912
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove the special storage layout for
MANY/ONE_ONE
rel tables. Before this PR, we store columns in the fwd direction of aMANY/ONE_ONE
table or bwd direction of aONE/ONE_MANY
table as same as columns in a node table. That is to say, data in those columns are not stored in the CSR layout.This was introduced for two reasons: 1) as a special performance optimization that we will not factorize the scan output of these columns, instead go for vectorization, as the max degree is only 1; 2) as a usability feature to enforce the degree constraint.
The downside of this special optimization is that it complicates the code path in storage and our scan operators. It's good to keep it as a usability feature, and we should revisit the layout to generalize the performance optimization to all rel tables, regardless of their rel multiplicity, as scans on small degree
MANY_MANY
tables can also suffer from unnecessary factorization when we should switch to full vectorization.Note: this PR introduces performance degrades on
MANY/ONE_ONE
tables. See #2988. A temporary solution for the degrades, which adds a special path for scanning overMANY/ONE_ONE
rel tables in CSR format can possibly be introduced if the performance issue is critical, though prefer not to do it now as we should solve it alongside #2988.