Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort the previous bake when attempting the next autobake #1786

Merged
merged 3 commits into from
Apr 15, 2024

Conversation

zb3
Copy link
Contributor

@zb3 zb3 commented Apr 6, 2024

The current autobake design is that if the bake has already started, input updates do nothing.. This has proven to be counterintuitive, causing the issue #1467 to be reported (I think). Personally I'd also expect the autobake option to cause the previous bake to be aborted and restart with the new input..

Here's my attempt (at least there was an attempt...) at implementing this. The code is messy since I initially wanted to preserve the loader screen when changing input (canceling just the workers), but if the waiter has more than 1 output then I found that I need to call cancelBake anyway..

I was also able to test this behaviour thanks to the Sleep operation which makes reproducing the issue deterministic.

@zb3
Copy link
Contributor Author

zb3 commented Apr 6, 2024

Oh lord, another race condition in UI tests...

@zb3
Copy link
Contributor Author

zb3 commented Apr 7, 2024

I didn't see "Play Media" failing previously, but since ID3 tests were recently disabled, there's a possibility that conditions which'd make ID3 test fail now make the "Play Media" test fail, but this wasn't previously seen due to the test aborting earlier

@a3957273
Copy link
Member

a3957273 commented Apr 7, 2024

I wonder if it's the type of test. Both the ID3 and Play Media tests are file based. I'm... definitely suspicious that I've never seen the 'Play Media' test fail when the ID3 test was around. Something about the first time it runs?

@zb3
Copy link
Contributor Author

zb3 commented Apr 7, 2024

@a3957273, it makes sense and I was able to reproduce this by changing the pause time from 100 to 1 here:

function testOpFile(browser, opName, filename, cssSelector, output, args) {
browser.perform(function() {
console.log(`Current test: ${opName}`);
});
utils.loadRecipe(browser, opName, "", args);
utils.uploadFile(browser, filename);
browser.pause(100).waitForElementVisible("#stale-indicator", 5000);
utils.bake(browser);

I think my suspicions were confirmed - after temporarily making the setInput silent inside fileLoaded here:

fileLoaded(inputNum) {
this.manager.tabs.updateTabProgress(inputNum, 100, 100, "input");
const activeTab = this.manager.tabs.getActiveTab("input");
if (activeTab !== inputNum) return;
this.inputWorker.postMessage({
action: "setInput",
data: {
inputNum: inputNum,
silent: false
}
});
this.updateFileProgress(inputNum, 100);
}

the problem has disappeared (but I suppose that one can't be silent...)

So it appears (to me) that after we start baking, the "debounced" update marks the output as stale so it's still stale even after the bake result arrives..

The 100ms pause normally makes the bake begin only after it's all done, but as we can see in practice this is not reliable enough..

To be honest, the whole "debounce" thing here makes my head explode.. maybe THIS PR could help making it obsolete? Or maybe at least be used for user events (like keystrokes) only?

this.baking = false;
}

if (this.autoBake_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this variable alone ends in a _.

@@ -160,7 +160,12 @@ class App {
// has completed.
if (this.autoBakePause) return false;

if (this.autoBake_ && !this.baking) {
if (this.baking) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do intend to review this properly in time, I just think this PR likely needs an extensive amount of testing to make sure it doesn't break any other aspects of the service.

In particular, I'm struggling to work out why the original implementation explicitly excluded autobaking when an existing value was being computed.

As well as that, I'm worried there's a race case somewhere here. Cancelling a bake in a worker looks asynchronous, as far as I can tell.

I should have more time tomorrow night to look into this, although I generally have more time for this project over the weekends.

@a3957273
Copy link
Member

Okay, tested for half an hour and I haven't managed to break / run into stale / race cases. I think the logic here is sound. Thanks for fixing a particularly long-running and painful bug! ❤️

@a3957273 a3957273 merged commit 42ad9a4 into gchq:master Apr 15, 2024
4 checks passed
@zb3
Copy link
Contributor Author

zb3 commented Apr 15, 2024

Ouch, looks like the test is now failing in the new place:
https://github.com/gchq/CyberChef/actions/runs/8681798208/job/23805004761

and I'm not able to reproduce it locally even when running all tests up to that one, adjusting delays doesn't help :(

Generally this PR wasn't meant to solve race conditions inside tests, but helps to reconsider the use of debounce in two places. #1784 could solve the failing url test though.

EDIT: I've also accidentally commited loading a file inside the "Loading from URL" test.. I've used this in development to be able to reproduce it failing. OTOH, tests "should" be able to run independently..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants