Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-parse end2end tests #1507

Merged
merged 3 commits into from
May 8, 2023
Merged

Auto-parse end2end tests #1507

merged 3 commits into from
May 8, 2023

Conversation

rfdavid
Copy link
Collaborator

@rfdavid rfdavid commented May 3, 2023

An initial proposal to scan end-to-end files inside the e2e directory and dynamically run the tests.

This PR is related to the first task in the proposal TODO list:

  • Set up the framework to auto-parse and run end2end tests without manually writing cpp code.
  • Rewrite all read-only tests with new auto-parsing and running framework.

How to add new end-to-end tests

For each e2e test, create a file named "test.group" containing the following fields:

-GROUP [group name]
-TEST [test name]
-DATASET [dataset]
-CHECK_ORDER true - use runTestAndCheckOrder() instead of runTest().

Implementation details

  • e2e_read_test recursively scan for .test files inside test/test_files
  • If the directory contains the configuration file test.group, parse the file and extract the necessary information to create the tests dynamically
  • Although READ_ONLY is in the test_files, it's been used only for informative purposes

@rfdavid rfdavid requested a review from ray6080 May 3, 2023 17:35
@codecov
Copy link

codecov bot commented May 3, 2023

Codecov Report

Patch coverage: 88.88% and project coverage change: -0.01 ⚠️

Comparison is base (fc358a3) 91.93% compared to head (818b8f7) 91.92%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1507      +/-   ##
==========================================
- Coverage   91.93%   91.92%   -0.01%     
==========================================
  Files         680      680              
  Lines       24524    24533       +9     
==========================================
+ Hits        22545    22553       +8     
- Misses       1979     1980       +1     
Impacted Files Coverage Δ
src/common/file_utils.cpp 74.35% <85.71%> (+1.11%) ⬆️
src/include/common/file_utils.h 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start!

I have one design change that I hope to discuss with you.
Currently in this PR, we rely on the directory naming to figure out the test suite name and dataset path, which is fine, but limits the organization of test files, and coupling the directory name with test semantic is not a clean design in my mind. I think it's better to just have one file under the root level directory (tinysnb, npy_1d, etc.) that documents test suite name and dataset path.
You can see the first example in my proposal document. Let me know what do you think, we can discuss what's a better way.

test/runner/e2e_read_test.cpp Outdated Show resolved Hide resolved
test/runner/e2e_read_test.cpp Outdated Show resolved Hide resolved
test/runner/e2e_read_test.cpp Outdated Show resolved Hide resolved
test/runner/e2e_read_test.cpp Outdated Show resolved Hide resolved
test/runner/e2e_read_test.cpp Outdated Show resolved Hide resolved
test/runner/e2e_read_test.cpp Outdated Show resolved Hide resolved
test/runner/e2e_read_test.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I noticed one thing, when I actually run tests, I see message like this "TinysnbTest.Agg # GetParam() = Agg". Do you have any ideas why we have # GetParam() = Agg?

@rfdavid
Copy link
Collaborator Author

rfdavid commented May 4, 2023

Thank you @ray6080 for the valuable feedback!
I totally agree, I'll work on parsing the .group file to extract the test suite properties (group, dataset, dbpath, others). I already have an idea in mind, I'll share it once I figure out some details.

@rfdavid
Copy link
Collaborator Author

rfdavid commented May 5, 2023

Also, I noticed one thing, when I actually run tests, I see message like this "TinysnbTest.Agg # GetParam() = Agg". Do you have any ideas why we have # GetParam() = Agg?

I realize I'm setting value_param inside testing::RegisterTest even though we're not using params. I'll remove it.

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please squash before you merge. Thanks!

src/include/common/file_utils.h Outdated Show resolved Hide resolved
test/runner/e2e_read_test.cpp Outdated Show resolved Hide resolved
std::string line;
while (getline(ifs, line)) {
if (line.starts_with("-GROUP")) {
result.testGroup = line.substr(7, line.length());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you trim whitespaces after calling substr? I think it's better to guard against whitespaces in group names.
Same for the following if cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added setConfigValue to solve this.
Once I start working on the test file parser, I was thinking of improving this parser by separating it into other classes and having better validation. I'll share more specifics soon.

TestConfig testConfig;
std::vector<TestConfig> tests;
std::string previousDirectory;
for (const auto& entry : std::filesystem::recursive_directory_iterator(path)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with recursive_directory_iterator. Just one question here: I guess recursive_directory_iterator have the iteration order guarantee? It seems you rely on that as you are checking FileUtils::getParentPathStem(entry) != previousDirectory. Specifically, I just want to make sure that the following case is covered in this function.

test_files/test.group
test_files/t1.test
test_files/group1/test.group
test_files/group1/g1.test
test_files/group2/case1/test.group
test_files/group2/case1/g2c1.test

Alternatively, I think something like the following also makes sense, if you don't want to rely on the recursive_directory_iterator:

void scanTestFiles(const std::string& path, std::vector<TestConfig>& configs) {
    for (entry in path) {
        if (test group file exists) { 
            auto config = parseTestGroup(path); 
            if (config.isValid()) {
                configs.push_back(config);
            }
        }
        if (entry is directory) { scanTestFiles(entry.path, configs); }
    }
}

TestConfig parseTestGroup(const std::string& path) {
    auto testConfig = TestHelper::parseGroupFile(testGroupFile);
    for (entry in path) {
        if (entry is a file && entry's extension is ".test") {
            testConfig.files.push_back(entry.path().string());
        }
    }
    return testConfig;
}

Your call here. I don't have preferences over these two solutions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good point.
From my tests, I haven't seen something like this (which will result in failure):

test_files/group2/case1/test.group
test_files/group1/g1.test
test_files/group2/case1/g2c1.test

However, looking at the documentation, it seems the order is not guaranteed.
Your solution is more reliable and clear, I'll implement that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially implemented your code, the only necessary change was to add a bool to check if the current directory was processed and not create multiple configs. It worked, but I slightly changed it to check the directories first and then parse them. I believe it's cleaner, the functions are very small. I hope you're ok with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! LGTM!

Initial proposal to scan end to end files inside e2e directory
and dynamically run the tests.
@rfdavid rfdavid merged commit 0e07276 into kuzudb:master May 8, 2023
6 of 7 checks passed
@rfdavid rfdavid deleted the e2e_read_test branch May 9, 2023 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants