-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow rel copier #1154
Arrow rel copier #1154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start. I have a few medium-level suggestions, which you should address. Let's also discuss the one about the "list delimiter".
Also: why are there no tests using parquet and arrow in this PR? I only see tests using csv. Those should also be added.
src/include/storage/in_mem_csv_copier/in_mem_arrow_rel_copier.h
Outdated
Show resolved
Hide resolved
0225694
to
b2ca065
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a small list of comments but one of them (the possible bug about unique_pointer. I want to see that fixed and also look at the code that fixes that bug. Other than that, there are a few minor comments.
const string filePath; | ||
const CSVReaderConfig csvReaderConfig; | ||
shared_ptr<ReaderConfig> readerConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a shared_ptr? I think it should be a unique_ptr and you can share the raw pointer if you have to. I see you say this:
Using unique_ptr leads to some errors, which indicates that it relates to calling the destructor of CopyDescription, I think it's because of using std::move on CopyDescription in the constructor of some other class when initializing. Using shared_ptr avoids the error.
Whatever it is, we need to first understand why unique_ptr does not work. It could be a bug. We can't fix a problem without understanding it. So whatever the problem, first let's understand it and find a fix (I don't think there is any good reason for this to be shared_ptr.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem to be a bug. There are a few places in the code base that copies the CopyDescription object, which leads to copying the unique_ptr. Since unique_ptr does not allow having multiple pointers pointing to the same object, it does not have a copy constructor. Therefore, an error about calling an implicitly-deleted copy constructor is triggered. Using shared_ptr avoids this issue.
void InMemArrowRelCopier::putPropsOfLineIntoLists(InMemArrowRelCopier* copier, | ||
vector<PageByteCursor>& inMemOverflowFileCursors, const vector<shared_ptr<T>>& batchColumns, | ||
const vector<nodeID_t>& nodeIDs, const vector<uint64_t>& reversePos, int64_t blockOffset, | ||
int64_t& colIndex, char delimiter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me clarify what I was referring to when saying it was a hack. In initCSVReader there is this line:
arrowParse.delimiter = copyDescription.readerConfig->tokenSeparator;
Why is delimter being set to tokenSeparator? This looks like a hack. tokenSeparator is a CSV-specific option we are giving. It has nothing to do with the List data type. So again, let's discuss this.
Update: I think I now understand why I was confused. The reason is ReaderConfig::tokenSeparator is a confusing name because it should be called ReaderConfig::delimiter. I left a comment on this in csv_reader.h.
e535966
to
ef12216
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job.
CopyDescription::CopyDescription(const CopyDescription& copyDescription) | ||
: filePath{copyDescription.filePath}, readerConfig{nullptr}, fileType{ | ||
copyDescription.fileType} { | ||
if (fileType == FileType::CSV) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for this null or not condition directly on the readerConfig field. So something like:
if (copyDescription.readerConfig) {
this->readerConfig = make_unique<CSVReaderConfig>(*copyDescription.readerConfig);
}
20f39f1
to
7167f06
Compare
Implemented arrow rel copier
Refactored code of both arrow rel copier and arrow node copier
Deleted old rel copier
Changed some test files to accommodate CSV standard