Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support for asynchronous operations #79

Closed
tlwr opened this issue Feb 11, 2017 · 24 comments
Closed

Feature request: Support for asynchronous operations #79

tlwr opened this issue Feb 11, 2017 · 24 comments
Assignees

Comments

@tlwr
Copy link
Contributor

tlwr commented Feb 11, 2017

Overview

Right now (as far as I know) operations are limited to synchronous functions, for example

myOperation: function(input, args) {
  return doSomething();
},

However it would be useful to be able to support promises too, for example

myOperation: function(input, args) {
  return new Promise(function(resolve, reject) {
    doAsyncThing()
    .then(function(data) {
      resolve(data); // Equivalent to `return data`
    })
    .catch(function(error) {
      reject(error);  // Equivalent to `throw error`
    })
  });
},

On the surface it looks a lot uglier than the former, but it would allow us to support more operations.

Example

I was working on implementing #6 using OpenPGP.js but the encrypt and decrypt operations (among others) all use promises, which currently make it impossible to integrate in the current system (AFAIK).

Thoughts

I think it can be done without forcing any changes on how operations are configured. I think the main bulk of the changes would happen in the code that runs the recipe, and in how flow control operations currently work.

@n1474335
Copy link
Member

#81 closed, work to continue on feature-async-ops branch. Continue discussions here.

@n1474335 n1474335 changed the title Feature request: accept promises as return values Feature request: Support for asynchronous operations Feb 28, 2017
@tlwr
Copy link
Contributor Author

tlwr commented Feb 28, 2017

Replying to your comments from #81

There are some bugs surrounding some of the Flow Control operations, particularly 'Conditional Jump'. I want to make sure this is fully tested before we roll it out and that will be a lot easier once the test runner has been merged, so we'll get that sorted out first.

Completely agree.

Using promises means that we'll loose support for a lot of browsers. CyberChef is fairly forward leaning so I'm not massively concerned about support for very old browsers, but this is a pretty drastic step up. We should think about how to improve backwards compatibility (perhaps by using something like Babel) before we go ahead and merge this into the production version.

In #84 3a90244 I added a polyfill for promises (so we could use PhantomJS: src/js/lib/es6-promise.auto.js ). If we use Babel or equivalent we will still need the polyfills (for the features we want to use) (source)

More generally it is probably worth having a larger discussion around:

  • Transpilers: Babel, TypeScript, Flow, etc
  • Bundlers: Webpack, grunt concat, etc
  • Module systems: CommonJS, ES6, etc

Given the size and scope of using some of the above tools I think it is worth making a new issue (or using #2 or #5) and discussing it there.

It would be nice to make better use of this functionality, perhaps by adding the ability to cancel bakes if they are taking too long (at least those that are asynchronous). This could require quite a bit more work and I think we should thrash that out in the new branch and then perhaps up the major version number when we eventually merge it into master.

I'm fairly certain bake canceling is doable on the recipe level (i.e. canceling the baking process after an operation has finished) but I'm not sure how one would go about canceling during an operation.

One such solution would be to move the baking process into a web worker and then using worker.terminate() to completely cancel baking. It would also make the UI smoother during high compute operations.

I agree that we should have a new branch (after this is merged).

@tlwr
Copy link
Contributor Author

tlwr commented Mar 1, 2017

While looking into bugs associated with Conditional Jump + async ops, I noticed that the description for the third argument is "Maximum jumps (if jumping backwards)" but there is no corresponding backwards jump checks in the code.

Should line 182 of src/js/core/FlowControl.js:

if (state.numJumps >= maxJumps) {

be

if (state.numJumps < 0 && state.numJumps >= maxJumps) {

or is this done elsewhere?

If there is no backwards jump check then the following test fails, which seems unexpected.

    {
        name: "Conditional Jump: Skips 0",
        input: [
            "match",
            "should be changed 1",
            "should be changed 2",
        ].join("\n"),
        expectedOutput: [
            "match",
            "should be changed 1 was changed",
            "should be changed 2 was changed"
        ].join("\n"),
        recipeConfig: [
            {
                op: "Conditional Jump",
                args: ["match", 0, 0],
            },
            {
                op: "Find / Replace",
                args: [
                    {
                        "option": "Regex",
                        "string": "should be changed 1"
                    },
                    "should be changed 1 was changed",
                    true,
                    true,
                    true,
                ],
            },
            {
                op: "Find / Replace",
                args: [
                    {
                        "option": "Regex",
                        "string": "should be changed 2"
                    },
                    "should be changed 2 was changed",
                    true,
                    true,
                    true,
                ],
            },
        ],
    },

@n1474335
Copy link
Member

n1474335 commented Mar 1, 2017

Each jump, whether forward or backward, increments state.numJumps by one. This is done here. state.progress is effectively the instruction pointer and it is this which is manipulated to point to the next operation to execute.

There is a bug causing your test to fail, however it's actually the line which increments state.progress once maxJumps has been reached. I'll be rolling out the fix soon, along with the test you have posted. Good spot.

@tlwr
Copy link
Contributor Author

tlwr commented Mar 1, 2017

Thanks for the clarification and the additional bug spot.

Is there still need for a check (using the correct variable jumpNum) or am I misunderstanding the purpose of the parameter description "Maximum jumps (if jumping backwards)"?

I have a few tests for "Jump" and "Conditional Jump" so when you merge the test runner and roll out the fix into the feature-async-opsbranch I'll make a PR with those tests.

While writing those tests I made the silly mistake (an embarrassing number of times) of using the following operation config:

{
  op: "Fork",
  args: [-1],
}

I indended to run the operation before Fork which is Fork [-2]

which will cause an infinite loop (it just re-runs the Jump or Conditional Jump operation). I cannot think of a case where the ability to do this is desirable. Is it worth having the following code in Jump and Conditional Jump:

if (jumpNum == -1) {
    jumpNum = -2;
}

i.e. we default to what the user probably means, alternatively we could throw an error "Do not use -1 for Jump or Conditional Jump".

@n1474335
Copy link
Member

n1474335 commented Mar 1, 2017

I think there is some confusion here on one of our parts. The argument "Maximum jumps (if jumping backwards)" (maxJumps) is meant to specify the maximum number of times the state.progress index is modified - it's not related to how much it is changed by. The intention is to prevent infinite loops.

For example, you could jump backwards 5 operations, 10 times by setting jumpNum to -5 and maxJumps to 10. Each time you jump, state.numJumps is incremented by 1 and state.progress is reduced by 5. You're going to execute a further 50 operations, but none of the accumulators actually record that, they just make sure that you only divert execution 10 times. Does that explain things or am I misunderstanding you?

I don't believe you can enter an infinite loop by just setting jumpNum to -1. Can you provide an example where this happens? It will just run 'Jump' (or 'Conditional Jump') over and over until it reaches maxJumps and will then continue execution as normal. I understand why a value of -1 is potentially confusing and pointless though. Should we just treat all negative values as one less than entered? Would that make more sense?

I have merged master back into feature-async-ops so the test runner is available on that branch now.

@tlwr
Copy link
Contributor Author

tlwr commented Mar 2, 2017

Please ignore the infinite loop remarks (it was a bug on my end ignoring state.numJumps and was oversight on my part).

I think the your solution of "massaging" the negative values such that:

  • Jump of 0 is a no-op
  • Jump of 1 is skip past the next operation (as per usual)
  • Jump of -1 is skip to the previous operation (equivalent of -2 at present).

In #86 I have added some tests for Jump and Conditional Jump.

Also your test "Fork, Conditional Jump, Encodings" fails for me reproducibly

Input

git clone https://github.com/gchq/CyberChef.git
cd CyberChef
git checkout -b feature-async-ops --track origin/feature-async-ops
npm install
grunt test

Output

Running "execute:test" (execute) task
-> executing /home/toby/CyberChef/test/NodeRunner.js
✔️️ To Base58 (Bitcoin): nothing
✔️️ To Base58 (Ripple): nothing
✔️️ To Base58 (Bitcoin): 'hello world'
✔️️ To Base58 (Ripple): 'hello world'
✔️️ From Base58 (Bitcoin): 'StV1DL6CwTryKyV'
✔️️ From Base58 (Ripple): 'StVrDLaUATiyKyV'
✔️️ Fork: nothing
✔️️ Fork, Merge: nothing
✔️️ Fork, (expect) Error, Merge
🔥 Fork, Conditional Jump, Encodings
	Fork - Conditional Jump - Return - undefined is not an object (evaluating 'op.inputType')
✔️️ Conditional Jump: Skips 0
✔️️ To Morse Code: 'SOS'
✔️️ From Morse Code '... --- ...'
✔️️ Regex, non-HTML op


TOTAL 14
PASSING 13
ERRORING 1

Not all tests are passing
Warning: -> error 1 /home/toby/CyberChef/test/NodeRunner.js (348ms) Use --force to continue.

@n1474335
Copy link
Member

n1474335 commented Mar 4, 2017

14e9ea6 adds the jumpNum massaging.

I've merged #86.

The "Fork, Conditional Jump, Encodings" test was added specifically because I knew it caused errors in this branch ;)

@tlwr
Copy link
Contributor Author

tlwr commented Mar 4, 2017

I wasn't quite sure whether the test was supposed to be 🔥 or ✔️️ 😄 , thanks for clarifying

Just made PR #88 and I wanted to elaborate (in this thread) on my words:

This bug could instead have been fixed by changing finish condition from currentStep === recipe.opList.length to currentStep >= recipe.opList.length to prevent this class of bugs in the future.

Although changing the check to >= would have prevented (arguably just hidden) this bug, I still think it was erroneous behaviour in the operation of "Return" (under the new rules) that caused the error, and not the code in Recipe.execute.

I think the mental model we should have about Recipe.execute is that currentStep should fall within the bounds: 0 <= currentStep <= recipe.opList.length (where currentStep === recipe.opList.length indicates a complete execution), and anything out of those bounds is an indication of an error.

As such, the === checking in Recipe.execute helped us to find a bug (the bug in "Return" being caused by us changing the rules).

@n1474335
Copy link
Member

n1474335 commented Mar 6, 2017

I agree with your logic here, but in the interests of "defensive programming", I think it's best to use >=. You are correct that the bug was in the "Return" operation though so it's right to fix that as well.

@n1474335
Copy link
Member

n1474335 commented Mar 7, 2017

Quick update on the progress for merging this.

I'm working on a fairly comprehensive bit of work sorting out the lib dependencies. This involves importing as many libraries as possible through npm and then bundling them all using webpack instead of grunt. This work should also encompass the introduction of a proper structure for dealing with polyfills, which will allow us to use Promises and several other newer JavaScript features in a well defined way.

It would still be nice to include a way of cancelling operations while they are running, however that should be handled in a separate branch. I've looked into using web workers in the past but they rely on referencing an external file, which doesn't fit with the compiled model CyberChef uses. The only solution seems to be to create a Blob and load that dynamically into a new worker, which seems a little convoluted and messy. If anyone has any better ideas, I'd love to hear them.

Once the new module and packing structure has been introduced we'll look to get this branch integrated. Feel free to add any more tests that you think would be relevant and if there are operations that rely on promises that you want to work on, they can be submitted to this branch as well.

@tlwr
Copy link
Contributor Author

tlwr commented Mar 7, 2017

You are right about the blob method of creating web workers, however I think it can be tidier than you expect.

Instead of the current process of concatenating all JS files into one, we concatenate all main window JS into one file and all worker JS into one worker JS file. The worker JS file is has type="text/worker" (and so is not executed by the browser) and the main window has a small amount of code for creating the web worker.

So we could have something like this (relies heavily on this MDN entry):

<html>
  <body>
    <script type="text/worker">
      // the built worker
      window.onmessage = function(message) {
        postMessage("echo: " + message);
      };
    </script>
    <script type="text/javascript">
      // the built JS file
      var workerNode = document.querySelector("script[type='text\/js-worker']");
      var workerContent = workerNode.textContent;
      var workerBlob = new Blob(workerContent, {type: 'text/javascript'});
      var worker = new Worker(window.URL.createObjectURL(workerBlob));
      worker.onmessage = function(message) {
        console.log("From worker: " + message);
      };
      worker.postMessage("hello");
    </script>
  </body>
</html>

All of this is quite unnecessary in the short-term though, we can just implement a simple "Stop baking" button that stops baking after the current operation is finished. However I believe this solution will only work for asynchronous tasks.

The new dependency bundling system looks to be a great improvement, thanks for taking the time to work on it! Perhaps it is worth creating a branch (and issue) for it to outsource any grunt work (no pun intended) you find yourself too busy to tackle.

@n1474335
Copy link
Member

n1474335 commented Mar 8, 2017

You're right, that doesn't look quite as ugly as I feared. I still wish that there was proper JS support for threads based on objects instead of script files - a rather outdated concept these days. However if this is what's available, it's what we'll have to use.

The newly modified structure I'm building should further decouple the core and operations from the HTML view and the browser environment in general. The aim is that the actual Chef part (including all the operations) should be able to run in Node if you so wish, giving us the option to run it on a server and open up a RESTful API to it. We could then allow the user of the web app to specify whether they want their recipe to be run locally or remotely, in case it's process intensive or requires access to a module that cannot be (or is difficult to) run in a browser (shellcode disassembly for example). It may then also be possible to support operations written in different languages like Python or C/C++ etc.

There are various hurdles to overcome before we'll be in a position to actually offer this, and it may never be something that we can offer ourselves, but it would be nice to at least have the capability to allow people to set up their own.

Once I've got the restructuring to a stage that actually compiles, I'll push it to GitHub and open it up for review. I won't give any timescales yet as this is still something I'm working on in my spare time, but it shouldn't be too far off.

@tlwr
Copy link
Contributor Author

tlwr commented Mar 9, 2017

Being able to have pluggable front-ends and back-end extensions sounds very appealing:

Big datasets

For files above ~100MB CyberChef doesn't seem to tackle very well; a "path/to/file" input with a remote back-end (+ an actual Buffer datatype, file IO, and even FFI) would go a long way to solving the problems currently faced when running larger datasets.

Pluggable import/export

I was also mulling around the idea of being able to get data from more sources than just text/files (e.g. SQL databases, S3/Blob storage, etc), however due to browser limitations it seems this is currently impossible. However with the system you describe this seems easy to envision as possible. Especially with modules (install aws-cyberchef to get a connector to push/pull data to/from S3, Redshift, RDS, etc).

Don't break "linkability"

An often taken for granted value I get from CyberChef is due to the "linkability" of recipes. I can just email/slack a link with the confidence that the recipient will get the same results. I think it would be unwise to take any steps that prevents the continuation of this "feature".

However I don't think "linkability" and the new structure are mutually exclusive: I can easily imagine parallel run-times (e.g. CyberChef.html, cyberchefd + browser, and even a CyberChef.app [OSX]). Jupyter notebooks seem to be in a similar vein (though without a browser run-time IIRC).

@n1474335
Copy link
Member

n1474335 commented Mar 9, 2017

Yes, the ability to work with large files is one of the main reasons I want to create an API. Browsers really aren't the best place to work on large datasets, however much we might like it. Building connectors to other services is also an area that I want us to move into. Getting the core and operations set up in a really robust, packaged format is key to this.

"Linkability" is one of the most important features of CyberChef. It's very important that we preserve this as much as possible and I'm really glad you've identified it. Perhaps some of the new features won't allow for the input to be included in these links, but the recipe shouldn't be a problem, perhaps also with a parameter to specify where it be run. Don't forget that the recipe can be copied out of the "Save recipe" box as a raw JSON string. This should work across any platform that CyberChef is running on, whether that's Node, a browser, an API, a CLI, an app...

@graingert
Copy link
Contributor

graingert commented Apr 3, 2017

myOperation: function(input, args) {
  return new Promise(function(resolve, reject) {
    doAsyncThing()
    .then(function(data) {
      resolve(data); // Equivalent to `return data`
    })
    .catch(function(error) {
      reject(error);  // Equivalent to `throw error`
    })
  });
},

is really bad Promise manipulation, it can be re-factored to:

myOperation: function(input, args) {
    return doAsyncThing()
},

new Promise should only be used to convert callback based code to promises:

function makeRequest(url) {
    function executor(resolve, reject) {
        function cb(err, result) {
            err ? reject(err) : resolve(result);
        }
        makeRequestCb(url, cb);
    }

    return new Promise(executor);
}

...

await makeRequest('http://google.com');

@graingert
Copy link
Contributor

also to rebuild all the forking and conditions in an Asyncronous way, it would be a shame not to reuse RxJS Observables: soon to be built into JS

@graingert
Copy link
Contributor

additionally, now we're using babel we can use async/await:

async function myOperation(input, args) {
    const v = await someProcess(input);
    const w = await process(v);
    return v + w;
}

and even better, async generators: https://github.com/tc39/proposal-async-iteration

@tlwr
Copy link
Contributor Author

tlwr commented Apr 3, 2017

I agree with you that in a perfect world the above async code is not the most desirable. However, for the (many) cases where CyberChef is just calling existing library code I think it is sometimes the most pragmatic option.

In the case where there are either no possible errors to be dealt with or the error messages provided by the library are acceptable in "un-massaged" form then we can just write the following:

myOperation: function(input, args) {
  return someAsyncFoo(input);
},

Thanks for pointing out the async/await functionality. I think a lot of both the Chef/HTMLAPP code and the (async using) operation code could be simplified by async/await. I think it is worth prototyping on the PGP operations (#89) (potentially a future PR)

@graingert
Copy link
Contributor

I think it is sometimes the most pragmatic option.

It's never pragmatic to convert a promise into another promise with new Promise

@tlwr
Copy link
Contributor Author

tlwr commented Apr 3, 2017

I haven't had the most experience working with promises with may explain my ignorance but refactoring code similar to the following does leave me scratching my head:

myOperation: function(input, args) {
  return new Promise(function(resolve, reject) {
    fnThatNeedsMassaging(input)
    .then(function(result) {
      resolve(result.thingWeNeed);
    })
    .catch(function(particularlyUnfriendlyError) {
      if (particularlyUnfriendlyError.type == 1) {
        reject("This is a much more friendly error message for error type 1, you can resolve it by doing X.");
      } else if (particularlyUnfriendlyError.type == 2) {
        reject("This is a much more friendly error message for error type 2, you can resolve it by doing Y.");
      } else {
        reject("Unknown error" + particularlyUnfriendlyError.message);
      } 
    });
  });
},

@graingert
Copy link
Contributor

graingert commented Apr 3, 2017

Like this:

function processError({ type, message }) {
    switch(type) {
        case 1: throw 'This is a much more friendly error message for error type 1, you can resolve it by doing X.';
        case 2: throw 'This is a much more friendly error message for error type 2, you can resolve it by doing Y.';
        default: throw `Unknown error${message}`;
    }
}

myOperation: function(input, args) {
  return fnThatNeedsMassaging(input)
    .then(result => result.thingWeNeed)
    .catch(processError);
}

@tlwr
Copy link
Contributor Author

tlwr commented Apr 3, 2017

I can see how the above is a big rookie error, thanks.

@tlwr
Copy link
Contributor Author

tlwr commented May 6, 2017

Closed by #117

@tlwr tlwr closed this as completed May 6, 2017
BRAVO68WEB pushed a commit to BRAVO68WEB/CyberChef that referenced this issue May 29, 2022
BRAVO68WEB pushed a commit to BRAVO68WEB/CyberChef that referenced this issue May 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants