Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate shadow pages from wal records and rework wal to use serializer #3204

Merged
merged 3 commits into from
Apr 9, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cmake_minimum_required(VERSION 3.15)

project(Kuzu VERSION 0.3.2.5 LANGUAGES CXX C)
project(Kuzu VERSION 0.3.2.6 LANGUAGES CXX C)

find_package(Threads REQUIRED)

Expand Down
19 changes: 16 additions & 3 deletions src/common/serializer/buffered_file.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

#include <cstring>

#include "common/assert.h"
#include "common/exception/runtime.h"
#include "common/file_system/file_info.h"

namespace kuzu {
Expand All @@ -15,7 +17,9 @@ BufferedFileWriter::BufferedFileWriter(std::unique_ptr<FileInfo> fileInfo)
: buffer(std::make_unique<uint8_t[]>(BUFFER_SIZE)), fileOffset(0), bufferOffset(0),
fileInfo(std::move(fileInfo)) {}

// TODO: write assumes size is always less than BUFFER_SIZE here. should change this when needed.
void BufferedFileWriter::write(const uint8_t* data, uint64_t size) {
KU_ASSERT(size <= BUFFER_SIZE);
if (bufferOffset + size <= BUFFER_SIZE) {
memcpy(&buffer[bufferOffset], data, size);
bufferOffset += size;
Expand All @@ -39,7 +43,8 @@ void BufferedFileWriter::flush() {

BufferedFileReader::BufferedFileReader(std::unique_ptr<FileInfo> fileInfo)
: buffer(std::make_unique<uint8_t[]>(BUFFER_SIZE)), fileOffset(0), bufferOffset(0),
fileInfo(std::move(fileInfo)) {
fileInfo(std::move(fileInfo)), bufferSize{0} {
fileSize = this->fileInfo->getFileSize();
readNextPage();
}

Expand All @@ -58,9 +63,17 @@ void BufferedFileReader::read(uint8_t* data, uint64_t size) {
}
}

bool BufferedFileReader::finished() {
return bufferOffset >= bufferSize && fileSize <= fileOffset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always going to be correct? What if the last page in the file is only partially filled?
From where it's being used before it seems to me that it will continue reading the empty data at the end of the page, which will mess up the check for whether or not the last record is a WAL.
I think there should be some way of tracking how many records are in the WAL, or alternatively have a record type of 0 be used to mark invalid records, and then as long as the buffer gets reset to zero (which the bufferedfilewriter is doing) then it can break after it reaches a record with a type of 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bufferSize = std::min(fileSize - fileOffset, BUFFER_SIZE); bufferSize is bounded by the actual fileSize. so it won't read empty data at the end of the last page if it's partially filled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensuring wal is not corrupted is another thing we need to handle. I feel we might need to do checksum for records.

}

void BufferedFileReader::readNextPage() {
fileInfo->readFromFile(buffer.get(), BUFFER_SIZE, fileOffset);
fileOffset += BUFFER_SIZE;
bufferSize = std::min(fileSize - fileOffset, BUFFER_SIZE);
if (bufferSize == 0) {
throw RuntimeException(stringFormat("Reading past the end of the file {}", fileInfo->path));
}
fileInfo->readFromFile(buffer.get(), bufferSize, fileOffset);
fileOffset += bufferSize;
bufferOffset = 0;
}

Expand Down
14 changes: 10 additions & 4 deletions src/include/common/cast.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#pragma once

#include <typeinfo>

#include "common/assert.h"

namespace kuzu {
Expand All @@ -8,11 +10,15 @@ namespace common {
template<typename FROM, typename TO>
TO ku_dynamic_cast(FROM old) {
#if defined(KUZU_RUNTIME_CHECKS) || !defined(NDEBUG)
TO newVal = dynamic_cast<TO>(old);
if constexpr (std::is_pointer<FROM>()) {
KU_ASSERT(newVal != nullptr);
try {
TO newVal = dynamic_cast<TO>(old);
if constexpr (std::is_pointer<FROM>()) {
KU_ASSERT(newVal != nullptr);
}
return newVal;
} catch (std::bad_cast& e) {
KU_ASSERT(false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of the assert here? It doesn't seem like it would be any clearer of an error than the exception.
Is is just so that it gets caught if we catch kuzu::common::Exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that it's easier to debug with ku_assert than catching the std::bad_cast exception manually.

}
return newVal;
#else
return reinterpret_cast<TO>(old);
#endif
Expand Down
1 change: 1 addition & 0 deletions src/include/common/constants.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ struct BufferPoolConstants {
struct StorageConstants {
static constexpr char OVERFLOW_FILE_SUFFIX[] = ".ovf";
static constexpr char WAL_FILE_SUFFIX[] = ".wal";
static constexpr char SHADOWING_SUFFIX[] = ".shadow";
static constexpr char INDEX_FILE_SUFFIX[] = ".hindex";
static constexpr char NODES_STATISTICS_AND_DELETED_IDS_FILE_NAME[] =
"nodes.statistics_and_deleted.ids";
Expand Down
1 change: 1 addition & 0 deletions src/include/common/file_system/file_info.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ struct KUZU_API FileInfo {

virtual ~FileInfo() = default;

// TODO: This function should be marked as const.
uint64_t getFileSize();

void readFromFile(void* buffer, uint64_t numBytes, uint64_t position);
Expand Down
22 changes: 18 additions & 4 deletions src/include/common/serializer/buffered_file.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once

#include <cstdint>
#include <memory>

#include "common/constants.h"
Expand All @@ -19,13 +18,22 @@ class BufferedFileWriter final : public Writer {

void write(const uint8_t* data, uint64_t size) override;

void flush();

void resetOffsets() {
fileOffset = 0;
bufferOffset = 0;
}

uint64_t getFileOffset() const { return fileOffset; }

FileInfo& getFileInfo() { return *fileInfo; }

protected:
std::unique_ptr<uint8_t[]> buffer;
uint64_t fileOffset, bufferOffset;
std::unique_ptr<FileInfo> fileInfo;
static const uint64_t BUFFER_SIZE = BufferPoolConstants::PAGE_4KB_SIZE;

void flush();
};

class BufferedFileReader final : public Reader {
Expand All @@ -34,12 +42,18 @@ class BufferedFileReader final : public Reader {

void read(uint8_t* data, uint64_t size) override;

bool finished() override;

private:
static constexpr uint64_t BUFFER_SIZE = BufferPoolConstants::PAGE_4KB_SIZE;

std::unique_ptr<uint8_t[]> buffer;
uint64_t fileOffset, bufferOffset;
std::unique_ptr<FileInfo> fileInfo;
static const uint64_t BUFFER_SIZE = BufferPoolConstants::PAGE_4KB_SIZE;
uint64_t fileSize;
uint64_t bufferSize;

private:
void readNextPage();
};

Expand Down
3 changes: 2 additions & 1 deletion src/include/common/serializer/deserializer.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#pragma once

#include <cstdint>
#include <memory>
#include <string>
#include <unordered_map>
Expand All @@ -16,6 +15,8 @@ class Deserializer {
public:
explicit Deserializer(std::unique_ptr<Reader> reader) : reader(std::move(reader)) {}

bool finished() { return reader->finished(); }

template<typename T>
requires std::is_trivially_destructible<T>::value || std::is_same<std::string, T>::value
void deserializeValue(T& value) {
Expand Down
2 changes: 2 additions & 0 deletions src/include/common/serializer/reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ class Reader {
public:
virtual void read(uint8_t* data, uint64_t size) = 0;
virtual ~Reader() = default;

virtual bool finished() = 0;
};

} // namespace common
Expand Down
3 changes: 0 additions & 3 deletions src/include/main/client_context.h
Original file line number Diff line number Diff line change
@@ -1,17 +1,14 @@
#pragma once

#include <atomic>
#include <cstdint>
#include <memory>
#include <mutex>

#include "client_config.h"
#include "common/task_system/progress_bar.h"
#include "common/timer.h"
#include "common/types/value/value.h"
#include "function/function.h"
#include "function/table/scan_replacement.h"
#include "main/kuzu_fwd.h"
#include "parser/statement.h"
#include "prepared_statement.h"
#include "query_result.h"
Expand Down
1 change: 1 addition & 0 deletions src/include/storage/stats/table_statistics_collection.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ class TablesStatistics {

inline void checkpointInMemoryIfNecessary() {
std::unique_lock lck{mtx};
KU_ASSERT(readWriteVersion);
readOnlyVersion = std::move(readWriteVersion);
resetToNotUpdated();
}
Expand Down
13 changes: 8 additions & 5 deletions src/include/storage/storage_structure/db_file_utils.h
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
#pragma once

#include <cstdint>
#include <functional>

#include "common/types/types.h"
#include "storage/buffer_manager/bm_file_handle.h"
#include "storage/buffer_manager/buffer_manager.h"
#include "storage/wal/wal.h"
#include "transaction/transaction.h"

namespace kuzu {
namespace transaction {
enum class TransactionType : uint8_t;
} // namespace transaction

namespace storage {

struct DBFileID;
class BMFileHandle;
class BufferManager;
class WAL;
class DBFileUtils {
public:
constexpr static common::page_idx_t NULL_PAGE_IDX = common::INVALID_PAGE_IDX;
Expand Down
Loading