Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test performance drops quickly after running LSQB test #1651

Closed
andyfengHKU opened this issue Jun 8, 2023 · 2 comments · Fixed by #1654
Closed

Test performance drops quickly after running LSQB test #1651

andyfengHKU opened this issue Jun 8, 2023 · 2 comments · Fixed by #1654
Assignees

Comments

@andyfengHKU
Copy link
Contributor

andyfengHKU commented Jun 8, 2023

I notice that before LSQB test, our average test cases takes around 0.15s. While after running LSQB test, all subsequent test cases take longer, average 0.6s on my machine. I wonder if this is because LSQB test bump BM size up and we don't reset BM size to a smaller size afterwards?

@rfdavid
Copy link
Collaborator

rfdavid commented Jun 8, 2023

The new feature recently deployed, CSV_TO_PARQUET, converts CSV files into PARQUET files. The idea is to temporarily create the parquet files while executing the tests and then remove the files after all tests run.
However, ctest executes the binary e2e_test multiple times because it separates in multiple processes. So it's removing and creating the parquet_temp multiple times instead of re-using the generated parquet files.
It's not possible to set up a custom environment with a teardown callback directly in the script, so I'm checking other alternatives to solve this issue.
A temporary workaround is either comment e2e_test.cpp:FileUtils::removeDir(parquetDatasetTempDir); to keep the files, or rename interactive-short-parquet.test to interactive-short-parquet.disabled and also lsqb_queries_parquet.test to lsqb_queries_parquet.disabled

@rfdavid
Copy link
Collaborator

rfdavid commented Jun 9, 2023

Update:

ctest command calls the binary e2e_test for each test. We currently have 222 tests, so for each test ctest runs somethling like:

Test command: /Users/rfdavid/Devel/waterloo/kuzu/build/release/test/runner/e2e_test "--gtest_filter=BinderErrorTest.AddPropertyDuplicateName" "--gtest_also_run_disabled_tests"

e2e_test recursively scans test_files directory, parses all .test files, convert to parquet the necessary CSV, registers all the tests, and then it executes only the test passed in the argument. This is executed 222 times, but only now we noticed the performance due to the csv -> parquet conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants