Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in Connection::interrupt #1828

Merged
merged 1 commit into from
Jul 24, 2023
Merged

Fix bugs in Connection::interrupt #1828

merged 1 commit into from
Jul 24, 2023

Conversation

benjaminwinger
Copy link
Collaborator

@benjaminwinger benjaminwinger commented Jul 17, 2023

This is related to one of the failing tests in #1825, CApiConnectionTest.Interrupt. But this still doesn't seem to fix it as I'm still getting a hung process when running it with asan locally. (Fixed) However it was originally also sometimes segfaulting when I tested it locally, and that doesn't happen any more.

Previously, interrupting before a query has executed would mean the activeQuery would be null when interrupt tries to set its interrupted flag.
Additionally, the detached thread in the test might not call the interrupt function until after the connection has been destroyed.

I removed the activeQuery entirely since assigning to a unique_ptr is not atomic, so attempting to interrupt when a query starts might cause it to write to the old memory location after it has been deleted and replaced with a new activeQuery object (unlikely, but I think is possible if a context switch occurs at the wrong time). And it only seemed to be used to initialize the interrupted flag and the timer, which could be done from a simple function instead without the possibility of memory errors. (re-added following feedback on slack)

Note that it's still possible for there to be a data race if you interrupt at around the same time the interrupted field gets reset when the query starts. That could probably be handled by providing a return value indicating if the interruption occurred when the query is actually running.

@codecov
Copy link

codecov bot commented Jul 17, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (1e9d4fd) 91.19% compared to head (883134a) 91.20%.

❗ Current head 883134a differs from pull request most recent head 50cd096. Consider uploading reports for the commit 50cd096 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1828   +/-   ##
=======================================
  Coverage   91.19%   91.20%           
=======================================
  Files         779      779           
  Lines       28606    28605    -1     
=======================================
  Hits        26088    26088           
+ Misses       2518     2517    -1     
Files Changed Coverage Δ
src/main/connection.cpp 93.20% <ø> (-0.03%) ⬇️
src/include/main/client_context.h 100.00% <100.00%> (ø)
src/main/client_context.cpp 92.30% <100.00%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benjaminwinger
Copy link
Collaborator Author

I think this functionally makes CI worse since it now always times out instead of crashing and ending quickly.

@benjaminwinger
Copy link
Collaborator Author

With asan I think the 100ms delay before interruption is too little. But any delay could be too little with a high enough load on the machine, so I set it to interrupt repeatedly until the query exits.

@benjaminwinger benjaminwinger marked this pull request as ready for review July 18, 2023 21:22
@acquamarin
Copy link
Collaborator

With asan I think the 100ms delay before interruption is too little. But any delay could be too little with a high enough load on the machine, so I set it to interrupt repeatedly until the query exits.

I am a little confused. If the user issues an interrupt to a connection, the interruptted flag will be set to true. When the query is running, the Sink::getNextTuple() will constantly check whether the interrupt flag is true or not. So i don't think the interrupt will fail if we interrupt the query too early.

@benjaminwinger
Copy link
Collaborator Author

benjaminwinger commented Jul 19, 2023

The interrupt function will fail to have any effect if it interrupts before the interrupt flag gets reset to false when the query starts. That's as far as I can tell why the interrupt test kept hanging with asan, as between debug mode and asan it takes longer to do everything (before this patch it was crashing, since in the same situation it was dereferencing a null pointer).

Maybe it would make more sense to reset the interrupt flag at the end of the query instead of the beginning. Then the only possible data race would be if you interrupt between when the query has finished and when it resets the interrupt flag, in which case it shouldn't really matter since there's nothing to interrupt (at least, unless you're trying to interrupt the next query). But I assume that in real world use-cases you would only try to interrupt once the query has started and is obviously hung or taking a long time, so it's probably not something that will come up outside of tests.

@acquamarin
Copy link
Collaborator

acquamarin commented Jul 19, 2023

I think for the c interrupt test, we should swap the code for interrupt with the code that executes the query.
I mean should we place the kuzu_connection_query before the interrupt code?

/ This may happen too early, so try again until the query function finishes.
    std::thread t([&connection, &finished]() {
        do {
            std::this_thread::sleep_for(std::chrono::milliseconds(100));
            kuzu_connection_interrupt(connection);
        } while (!finished);
    });
    t.detach();
    auto result = kuzu_connection_query(
        connection, "MATCH (a:person)-[:knows*1..28]->(b:person) RETURN COUNT(*);");
    finished = true;

@@ -15,6 +15,11 @@ struct ActiveQuery {
explicit ActiveQuery();
std::atomic<bool> interrupted;
common::Timer timer;

inline void reset() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to cpp file, we want to keep our include files simple and clean

Interrupting before a query has executed would mean the activeQuery was null, leading to undefined behaviour
@benjaminwinger
Copy link
Collaborator Author

I think for the c interrupt test, we should swap the code for interrupt with the code that executes the query.
I mean should we place the kuzu_connection_query before the interrupt code?

kuzu_connection_query blocks, and the query seems to be designed to run for an extremely long time (hence the need for a successful interrupt). It needs to run in a separate thread, and be started beforehand, so that it can run while the query is executing.

@benjaminwinger benjaminwinger merged commit e625434 into master Jul 24, 2023
8 checks passed
@benjaminwinger benjaminwinger deleted the interrupt-fix branch July 24, 2023 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants