Serial fix node group #1863

aziz-mu · 2023-07-28T14:23:01Z

Fixing serial in node-groups branch. Also added a test/dataset for serial copy which required more than one node group

…less storage and wal utils

ray6080

Just took a quick look. I think there are still something missing in this PR before I dive into more detailed review:

You only handled the case for CSV, can you also add the logic to Parquet and NPY?
It seems CombineChunksToBatch is not performing zero-copy internally (found something here, double check their source code? The concatenation can be costful. We should avoid it. Are there any other solutions in your mind? if not, can you try to explore my previous suggestion on using ChunkedArray? For which you need to change ValueVector to be able to keep ChunkedArray instead of Array.
There is no distinction between forcing order preserving or not (has SERIAL or not) in reading from csv files. This might be fine for CSV, as it has to be in a serial mode anyways, but when you replicate changes to Parquet and NPY, we do need to consider the case that we can read concurrently (which is critical) if there is no SERIAL. So better to differentiate the reading logic of the two cases regarding to preserving order or not.

ray6080 · 2023-07-28T14:28:27Z

dataset/large-serial/copy.cypher

@@ -0,0 +1,2 @@
+CALL threads=2;


Usually we don't manually config num of threads for copy. Is there any special reason here?

I wanted this test to always fail if this feature wasn't working. Since it's based on a race condition, I found that running it with 2 threads made it a lot more likely to fail than running it with default parameters. Testing with default parameters, this test failed only 5/15 times, and with 2 threads it failed 15/15 times, which is what I wanted.

Not exactly sure why more threads doesn't make it more likely to fail.

Also, I know that there is a PARALLELISM command for the tests, but I'm not sure if that affects the actual copy command, which is the command where is issue occurs

src/include/planner/logical_plan/logical_operator/logical_copy.h

src/include/processor/operator/copy/copy_node.h

ray6080 and others added 23 commits July 20, 2023 16:59

node group-based node table storage

f26db56

basic copy and scan

9bade60

fix blob, ldbc tests, create/set for fixed-sized values; clean up use…

9a58598

…less storage and wal utils

merge

61653f9

changed NPY Reader to read multiple files at a time

751feed

removed unnecessary lines

62fc4e2

fixed bugs and re-enabled serial tests

cadc67f

fix bug

4a5297f

add new test

bc735ce

fix merge issues and formatting

fe6e08b

node group-based node table storage

c44255c

format

0477f05

format; comment out tests

d95568b

try to fix var sized column chunk templation

4da94e5

fix asan

fda7d69

cleanup

1d0d9de

clean up

1d01d75

remove rowIdx and filePath dataPos

864923b

fix serial

397b324

merge node group

d2bc2f3

fix improper merge

74c57cd

more merge fixes

803bfaf

bug fix

dd2637f

aziz-mu marked this pull request as ready for review July 28, 2023 15:47

ray6080 requested changes Jul 28, 2023

View reviewed changes

rename vars

2a7fabb

ray6080 force-pushed the node-group branch from fdd5e20 to b7691b2 Compare July 31, 2023 17:02

node group-based node table storage

b7352e4

ray6080 force-pushed the node-group branch from df1bb2c to b7352e4 Compare August 1, 2023 07:05

try fix tests

bdfe0bd

ray6080 and others added 4 commits August 1, 2023 15:52

fix nodejs test

0816067

fix java test

cd99fab

move metadataFH to storage manager

d98882b

add serial reading for Parquet and NPY

955842f

aziz-mu force-pushed the serial-fix-node-group branch from 2e12233 to 955842f Compare August 1, 2023 17:38

aziz-mu added 2 commits August 1, 2023 16:41

merge

2c6d432

delete files

4f03fd7

ray6080 force-pushed the node-group branch 3 times, most recently from 00eacad to d03b06a Compare August 2, 2023 08:23

Base automatically changed from node-group to master August 2, 2023 11:18

aziz-mu added 3 commits August 2, 2023 10:25

fix some merge conflicts

a6d6885

format

e1612bf

merge master

4713b25

aziz-mu mentioned this pull request Aug 3, 2023

Fix Serial for node-groups #1886

Merged

aziz-mu closed this Aug 3, 2023

andyfengHKU deleted the serial-fix-node-group branch November 6, 2023 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serial fix node group #1863

Serial fix node group #1863

aziz-mu commented Jul 28, 2023 •

edited

Loading

ray6080 left a comment

ray6080 Jul 28, 2023

aziz-mu Jul 28, 2023

aziz-mu Jul 28, 2023

Serial fix node group #1863

Serial fix node group #1863

Conversation

aziz-mu commented Jul 28, 2023 • edited Loading

ray6080 left a comment

Choose a reason for hiding this comment

ray6080 Jul 28, 2023

Choose a reason for hiding this comment

aziz-mu Jul 28, 2023

Choose a reason for hiding this comment

aziz-mu Jul 28, 2023

Choose a reason for hiding this comment

aziz-mu commented Jul 28, 2023 •

edited

Loading