-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node group based rel table #2246
Conversation
762f950
to
e925eea
Compare
@@ -104,8 +101,6 @@ void Catalog::addRelProperty( | |||
initCatalogContentForWriteTrxIfNecessary(); | |||
catalogContentForWriteTrx->getTableSchema(tableID)->addRelProperty( | |||
propertyName, std::move(dataType)); | |||
wal->logAddPropertyRecord( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we log dropProperty in the catalog, but not log addProperty there?
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #2246 +/- ##
==========================================
- Coverage 89.82% 88.61% -1.21%
==========================================
Files 1022 1002 -20
Lines 35452 32893 -2559
==========================================
- Hits 31845 29149 -2696
- Misses 3607 3744 +137
☔ View full report in Codecov by Sentry. |
ae1547f
to
5941d39
Compare
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
The bulk of the warnings issued are because of unused includes, but some are because of dead files leftover from #2246. Also, `subplans_table.h` had a `using` directive, which caused symbol pollution.
This is a quite bulky PR, sorry for that :).
Mainly this PR introduces several major changes:
StorageStructure
and a bunch of util functions for column and list files.Note that this PR breaks updates, creations, deletions for rel tables for a short period of time
We discussed this internally inside the team, though it's not a good practice to break master, we decided to merge this early to save efforts for everyone to sync around monster changes brought by this PR.
Rel Table Storage
A rel table consists of both forward and backward
RelTableData
, which consists of multipleColumns
.Each
Column
can be stored in two formats:REGULAR
andCSR
. Regular columns are used by both NodeTable and RelTable when the multiplicity isONE/MANY_TO_ONE
, thus, each node offset corresponds to one value in the column. CSR columns are used only by RelTable when the multiplicity isONE/MANY_TO_MANY
, where each node offset corresponds to a CSR list in the column.For a node group of
RelTableData
stored in CSR format, it contains following column chunks:When reading data from a node group, we always need to first read out its csr offsets into
RelDataReadState
, then read actual data columns based on the offsets.nbrColumn
later.COPY
COPY pipelines for rel tables are as follows:
The main part is executed as three pipelines.
The right most
PARTITIONER
pipeline gets executed first. It performs partitioning of all tuples based on their src and dst node offsets. We copy tuples to each partitioning buffer as for now and organize them into different partitions (i.e., node groups). This could contribute to large memory consumption when the copied dataset is large. The following solution to this is to allow spilling intermediate partitioning result to disk as tmp files, which introduces I/O overheads, but the I/O pattern should be always be sequential reads and writes.Other two pipelines are both
CopyRel
for each directions, which consume partitioned buffer from thePARTITIONER
pipeline. During execution, each thread will fetch one partition (i.e., node group) at a time, construct necessary column chunks for the node group, and append them to table data file.Note that this design is not able to handle very skewed dataset well. When one node group is extremely large, the copy rel pipeline could end up being only one thread working during most of the time, we should perhaps address this by further partitioning the extremely large node group and let multiple threads work concurrently on it.
TODOs
Tests
STRUCT
,MAP
.UNION
.CopyRDFTest
andRdfoxExample.Basic
.CopyRelTableMultiplicityViolationTest
andTinySnbExceptionTest.DeleteNodeWithEdgeErrorTest
.TinySnbCreateNodeTest
,TinySnbCreateReadRelTest
,TinySnbDeleteRelTest
,TinySnbMergeRelTest
,TinySnbSetReadRelTest
,UpdateRelTest
,DeleteRelTest
,CreateRelTest
,TCK
,DemoDBCreateTest
,DemoDBDeleteTest
.Refactoring
RelDataDirection
fromenum
toenum class
.leftNumRows
insideReader::readNextDataChunk()
.