Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing issue 1404 #1460

Merged
merged 1 commit into from
Apr 12, 2023
Merged

fixing issue 1404 #1460

merged 1 commit into from
Apr 12, 2023

Conversation

semihsalihoglu-uw
Copy link
Contributor

This PR partially fixes #1404. In particular it changes the main task in node_copier.cpp, which is to populate the columns, to issues a single task that is "morselized". Previously, we would issue one task for each part of the input csv or parquet file.

There are 3 other things I squeezed into this PR.

  1. Timeout logic to node copier: that is COPY statements that load node tables now time out.
  2. Any task scheduled through TaskScheduler::scheduleTaskAndWaitOrError() now also stops early when there is an exception: This is a two line change to TaskScheduler. We had a mechanism to stop queries early if a query times out. The mechanism was: The main thread that issues a pipeline periodically wakes up and checks if there is a time out. If there is, it interrupts the task. The code looked like:
    while (!task->isCompleted()) {
        if (context != nullptr && context->clientContext->isTimeOutEnabled()) {
            interruptTaskIfTimeOutNoLock(context);
        } 
        std::this_thread::sleep_for(
            std::chrono::microseconds(THREAD_SLEEP_TIME_WHEN_WAITING_IN_MICROS));

This would let then any thread working on this pipeline to stop in the next morsel they get (the code is inside physical_operator.h).
This PR changes this to also stop early similarly if the task has errored by simply adding an else to the if as follows:

 else if (task->hasException()) {
            // Interrupt tasks that errored, so other threads can stop working on them early.
            context->clientContext->interrupt();
        }

The PR also makes the node_copier benefit from this by adding a similar code in physical_operator.h to node_copier's batchPopulateColumnsTaskNew function, which is where threads get their morsels.

I did not write a unit test for this but I tested this manually by checking that indeed that large tables are not copied from CSV files till the end if there is an error in the beginning of CSVs.

  1. Change virtual memory size from 32TB to 8TB.

@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Patch coverage: 95.95% and project coverage change: +0.02 🎉

Comparison is base (06181ac) 92.58% compared to head (3b3f2a7) 92.61%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1460      +/-   ##
==========================================
+ Coverage   92.58%   92.61%   +0.02%     
==========================================
  Files         655      655              
  Lines       22900    22914      +14     
==========================================
+ Hits        21202    21221      +19     
+ Misses       1698     1693       -5     
Impacted Files Coverage Δ
src/include/main/client_context.h 100.00% <ø> (ø)
src/include/storage/copier/npy_node_copier.h 100.00% <ø> (ø)
src/include/storage/copier/table_copier.h 100.00% <ø> (ø)
src/storage/copier/node_copier.cpp 91.50% <93.44%> (-2.42%) ⬇️
src/common/task_system/task_scheduler.cpp 89.41% <100.00%> (+0.25%) ⬆️
src/include/storage/copier/copy_task.h 100.00% <100.00%> (ø)
src/include/storage/copier/node_copier.h 100.00% <100.00%> (ø)
src/main/database.cpp 99.15% <100.00%> (ø)
src/processor/operator/copy/copy_node.cpp 100.00% <100.00%> (ø)
src/processor/operator/copy/copy_rel.cpp 100.00% <100.00%> (ø)
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/main/database.cpp Outdated Show resolved Hide resolved
src/include/storage/copier/copy_task.h Outdated Show resolved Hide resolved
src/include/storage/copier/node_copier.h Outdated Show resolved Hide resolved
src/include/storage/copier/node_copier.h Outdated Show resolved Hide resolved
src/include/storage/copier/node_copier.h Outdated Show resolved Hide resolved
src/storage/copier/node_copier.cpp Outdated Show resolved Hide resolved
src/storage/copier/node_copier.cpp Outdated Show resolved Hide resolved
src/include/storage/copier/node_copier.h Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CopyCSV taskscheduler improvement
2 participants