Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Windows Support #1573

Merged
merged 1 commit into from
May 27, 2023
Merged

Add Windows Support #1573

merged 1 commit into from
May 27, 2023

Conversation

benjaminwinger
Copy link
Collaborator

A few tests seem to have regressed with the most recent changes from master. I haven't gotten around to looking into those yet, but I thought I should get this PR open for review in the meantime.

Mostly fixes #1504 (note omissions below) and supersedes #1544.

Notable changes

I added a KUZU_ prefix to the type enum in the C API. Unfortunately, windows headers pollute the import namespace making some of the more common names (e.g. BOOL) ambiguous.

I removed the VectorOperations class and moved {Unary,Binary,Ternary,Const}ExecFunction into the enclosing kuzu::function namespace (they could get their own child namespace if that's preferable). MSVC doesn't appear to allow function pointers to templated member functions (see this minimal example).

Omissions

  • The shell is not being built on windows as linenoise isn't yet working (it looks like the simplest option may be to try to get the current version of linenoise being used working with windows via some patches which exist for older versions of linenoise).
  • The python build is working, but a CI job for creating wheels has not been set up.
  • Bulding Debug or RelWithDebInfo on windows is tricky due to how the arrow build is set up and the fact that debug and release builds can't be linked together on windows, so it needs to match the main build. I've added an ARROW_INSTALL option which can point to the installed arrow prefix and can be used to switch between versions of arrow manually. Setting something up in the Makefile to do it automatically should be possible, but would probably be messy.

Also note that performance is rather poor. It seems to largely be related to file i/o.

@@ -6,7 +6,39 @@
namespace kuzu {
namespace function {

class VectorBooleanOperations : public VectorOperations {
template<typename FUNC>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can u try to put these back into VectorBooleanOperations. This probably won't cause compilation problem in MSVC.

*params[0], *params[1], result);
}

template<typename A_TYPE, typename B_TYPE, typename C_TYPE, typename RESULT_TYPE, typename FUNC>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. try put back into VectorListOperations

@@ -6,7 +6,23 @@
namespace kuzu {
namespace function {

class VectorNullOperations : public VectorOperations {
template<typename FUNC>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto try put back into VectorNullOperations

@@ -97,15 +97,6 @@ TEST_F(DemoDBTest, DeleteRelTest) {
ASSERT_EQ(TestHelper::convertResultToString(*result), groundTruth);
}

TEST_F(DemoDBTest, DeleteWithExceptionTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, that was something @acquamarin did in this commit. @acquamarin?

src/main/connection.cpp Show resolved Hide resolved
result += "\t" + tableIDSchema.second->tableName + "\n";
relTableNames.push_back(tableIDSchema.second->tableName);
}
std::sort(relTableNames.begin(), relTableNames.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -9,6 +9,13 @@
namespace kuzu {
namespace processor {

template<typename type>
bool compareEntryWithKeys(const uint8_t* keyValue, const uint8_t* entry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto try put back. We have many places that are written in a similar way. I don't know why some of them can be compiled while others we need to change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give an example of another place which didn't require changing? That might provide some insight into other ways the issue can be worked around.

It's worth noting that it seems to work fine as long as the function pointer isn't returned from another function (that's the point where the function resolution fails).

E.g. this works, but this does not.

Copy link
Contributor

@andyfengHKU andyfengHKU May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is a good point. Then I wonder if we also pass functor reference rather than return a functor. Because expose static function in header files is not a good coding style

For example in vector_boolean_operation we can do

void VectorBooleanOperations::bindExecFunction(
    ExpressionType expressionType, const binder::expression_vector& children, scalar_exec_func& func) {
    if (isExpressionBinary(expressionType)) {
        func = bindBinaryExecFunction(expressionType, children);
    } else {
        assert(isExpressionUnary(expressionType));
        func = bindUnaryExecFunction(expressionType, children);
    }
}

I think @acquamarin, @ray6080 and @benjaminwinger should all comment on this. If so, @benjaminwinger you can ignore my other comment and do this in a separate PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a reasonable solution to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is a good point. Then I wonder if we also pass functor reference rather than return a functor. Because expose static function in header files is not a good coding style

For example in vector_boolean_operation we can do

void VectorBooleanOperations::bindExecFunction(
    ExpressionType expressionType, const binder::expression_vector& children, scalar_exec_func& func) {
    if (isExpressionBinary(expressionType)) {
        func = bindBinaryExecFunction(expressionType, children);
    } else {
        assert(isExpressionUnary(expressionType));
        func = bindUnaryExecFunction(expressionType, children);
    }
}

I think @acquamarin, @ray6080 and @benjaminwinger should all comment on this. If so, @benjaminwinger you can ignore my other comment and do this in a separate PR.

Sounds good to me.

@@ -138,5 +133,9 @@ class OrderByKeyEncoder {
std::vector<encode_function_t> encodeFunctions;
};

template<typename type>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}
};

// Note: Previously part of a VectorOperations class, but MSVC is unable to resolve references to the function within subclasses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still try to wrap them in a class. It's not a good idea to expose static function directly.

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments. But I kind of want to take a look after rebase with master. As for now, it is not very clear some changes are already in master or made in this PR.

@@ -2,13 +2,19 @@ include(ExternalProject)

set(CMAKE_POSITION_INDEPENDENT_CODE ON)

if(WIN32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always keep arrow build type same as cmake build type regardless platforms here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though it would trigger an unnecessary rebuild of arrow when switching on other platforms, and building arrow isn't very fast.

@@ -26,6 +26,13 @@ void NodeTable::initializeData(NodeTableSchema* nodeTableSchema) {
}
}

void NodeTable::resetColumns(catalog::NodeTableSchema* nodeTableSchema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used to close the files for nodeTable, if we don't close the files, we can't do the rename

@@ -91,6 +91,7 @@ void TransactionManager::stopNewTransactionsAndWaitUntilAllReadTransactionsLeave
numTimesWaited++;
if (numTimesWaited * THREAD_SLEEP_TIME_WHEN_WAITING_IN_MICROS >
checkPointWaitTimeoutForTransactionsToLeaveInMicros) {
mtxForStartingNewTransactions.unlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a bug?

@@ -73,6 +73,21 @@ void DirectedRelTableData::initializeLists(RelTableSchema* tableSchema, WAL* wal
}
}

void DirectedRelTableData::resetColumnsAndLists(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here. Is this introduced in this PR?

status.dwLength = sizeof(status);
GlobalMemoryStatusEx(&status);
auto systemMemSize = (std::uint64_t)status.ullTotalPhys;
bufferPoolSize = (uint64_t)(BufferPoolConstants::DEFAULT_PHY_MEM_SIZE_RATIO_FOR_BM *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this line bufferPoolSize = (uint64_t)... out of the #if #else ?

@benjaminwinger benjaminwinger force-pushed the windows3 branch 2 times, most recently from 0637b5c to cbac99a Compare May 26, 2023 17:32
std::unique_ptr<FileInfo> FileUtils::openFile(const std::string& path, int flags) {
#if defined(_WIN32)
// Not providing GENERIC_READ seems to cause problems.
int dwDesiredAccess = GENERIC_READ;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convension is to use auto instead of the actual type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem to be mentioned in the style guide. Is the style guide out of date?

int dwShareMode = FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE;
int dwFlagsAndAttributes = FILE_ATTRIBUTE_NORMAL;
if (flags & O_CREAT)
dwCreationDisposition = OPEN_ALWAYS;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can rewrite this as auto dwCreationDisposition = (flags & O_CREAT) ? OPEN_ALWAYS : OPEN_EXISTING

int dwFlagsAndAttributes = FILE_ATTRIBUTE_NORMAL;
if (flags & O_CREAT)
dwCreationDisposition = OPEN_ALWAYS;
if (flags & (O_CREAT | O_WRONLY | O_RDWR))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convension is to always add brackets to if statements

@@ -125,5 +211,32 @@ std::vector<std::string> FileUtils::globFilePath(const std::string& path) {
return result;
}

void FileUtils::truncateFileToSize(FileInfo* fileInfo, uint64_t size) {
#if defined(_WIN32)
long offsetHigh = size >> 32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the size of long is dependent on the platform. Better to use auto offsetHigh = (uint32_t)(size >> 32)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is platform-specific code though. I was following the API for SetFilePointer, which says the arguments are a LONG, which according to this is just typedef long LONG;.

But I'll still change it; it's unintuitive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though that actually would require casting the pointer back to a long*/LONG* when passing the argument to SetFilePointer, which seems a little redundant. Maybe it's better to just actually use LONG to indicate it's a microsoft-specified type for a microsoft-specified interface.

src/main/connection.cpp Show resolved Hide resolved
@@ -26,6 +26,13 @@ void NodeTable::initializeData(NodeTableSchema* nodeTableSchema) {
}
}

void NodeTable::resetColumns(catalog::NodeTableSchema* nodeTableSchema) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used to close the files for nodeTable, if we don't close the files, we can't do the rename

@codecov
Copy link

codecov bot commented May 26, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.01 ⚠️

Comparison is base (dba7110) 91.95% compared to head (dcbe4b4) 91.95%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1573      +/-   ##
==========================================
- Coverage   91.95%   91.95%   -0.01%     
==========================================
  Files         701      701              
  Lines       25207    25322     +115     
==========================================
+ Hits        23179    23284     +105     
- Misses       2028     2038      +10     
Impacted Files Coverage Δ
src/binder/bind/bind_query.cpp 100.00% <ø> (ø)
src/binder/bind/bind_reading_clause.cpp 92.30% <ø> (ø)
src/function/vector_null_operations.cpp 62.50% <ø> (+5.35%) ⬆️
...function/comparison/vector_comparison_operations.h 81.42% <ø> (+10.97%) ⬆️
...ude/function/interval/vector_interval_operations.h 95.45% <ø> (ø)
.../include/function/schema/vector_label_operations.h 100.00% <ø> (ø)
...include/function/schema/vector_offset_operations.h 87.50% <ø> (ø)
...include/function/string/vector_string_operations.h 96.00% <ø> (ø)
...include/function/struct/vector_struct_operations.h 100.00% <ø> (ø)
src/include/planner/subplans_table.h 100.00% <ø> (ø)
... and 43 more

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.


int64_t FileInfo::getFileSize() {
#ifdef _WIN32
return GetFileSize((HANDLE)handle, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use GetFileSizeEx? GetFileSize seems only return 4 bytes as the result, which is not large enough.

std::unique_ptr<FileInfo> FileUtils::openFile(const std::string& path, int flags) {
#if defined(_WIN32)
// Not providing GENERIC_READ seems to cause problems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -9,6 +9,13 @@
namespace kuzu {
namespace processor {

template<typename type>
bool compareEntryWithKeys(const uint8_t* keyValue, const uint8_t* entry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is a good point. Then I wonder if we also pass functor reference rather than return a functor. Because expose static function in header files is not a good coding style

For example in vector_boolean_operation we can do

void VectorBooleanOperations::bindExecFunction(
    ExpressionType expressionType, const binder::expression_vector& children, scalar_exec_func& func) {
    if (isExpressionBinary(expressionType)) {
        func = bindBinaryExecFunction(expressionType, children);
    } else {
        assert(isExpressionUnary(expressionType));
        func = bindUnaryExecFunction(expressionType, children);
    }
}

I think @acquamarin, @ray6080 and @benjaminwinger should all comment on this. If so, @benjaminwinger you can ignore my other comment and do this in a separate PR.

Sounds good to me.

@andyfengHKU andyfengHKU merged commit 25fd9f0 into kuzudb:master May 27, 2023
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Windows support
4 participants