'Magic' operation - automatically detect and run operations #239

n1474335 · 2018-01-23T00:47:26Z

Summary

The 'Magic' operation attempts to automatically detect what format the input data is in and which operations can be used to make more sense of it. It does this through a variety of methods:

Using magic bytes to detect file types.
Using regular expressions to detect data encoded in a specific format (e.g. Base64, hex, URL encoding etc.)
Using byte frequency analysis to detect how closely the data matches various natural languages.

Once a possible encoding has been detected, the 'Magic' operation performs that operation and carries out the same process again. This can continue for several levels, controlled by the 'Depth' argument.

Examples

This example shows the 'Magic' operation detecting three levels of encoding. The results are listed in order of likelihood. The first row shows that the three operations 'From Base64', 'Gunzip' and 'From Hex' will result in an output that looks quite likely to be written in English. The second row shows that just running 'From Base64' results in an output that looks like a gzip file. The third row shows that the raw data without any operations applied doesn't look very much like any language, although it is closest to Portuguese.

This example shows a PNG image which has been URL and Base32 encoded. The 'Magic' operation has correctly detected these encodings and has also discovered that the 'Render Image' operation can be used to further improve the recipe.

This example shows the 'Magic' operation correctly discovering Hindi text underneath three levels of encoding and compression.

Details

The three detection methods mentioned above are explained here in further detail.

Magic bytes

This detection method was already available in CyberChef in the form of the 'Detect file type' operation. It has been incorporated into this operation to provide further metadata to make decisions from.

Regular expressions to detect encodings

Patterns have been added to all relevant operations in the OperationConfig.js file. These patterns specify as strictly as possible what the data should look like if it is to match the operation. For example, the following configuration is used for the 'From Base64' operation:

{
    match: "^(?:[A-Z\\d+/]{4})+(?:[A-Z\\d+/]{2}==|[A-Z\\d+/]{3}=)?$",
    flags: "i",
    args: ["A-Za-z0-9+/=", false]
}

Alternative patterns can be added for use with different arguments, for example Base64 encoding using the BinHex alphabet is specified like so:

{
    match: "^[!\"#$%&'()*+,\\-0-689@A-NP-VX-Z[`a-fh-mp-r]{20,}$",
    flags: "",
    args: ["!-,-0-689@A-NP-VX-Z[`a-fh-mp-r", false]
}

Byte frequency analysis

Using Pearson's Chi-Squared test, we can determine how closely a given set of data matches the byte frequency of a certain language. To generate the truth data, I downloaded dumps of Wikipedia in 284 different languages, stripped out the wiki formatting, then measured the frequency of every byte. This gave me a set of data, unique to each language, which shows how common each byte is when the characters are encoded in UTF-8.

Future improvements

Show which operations the data matches even if the 'Depth' argument does not allow running them
Add entropy calculations for each branch. Entire input and sliding window to generate structural map.
Run speculative XOR, ROT, Bit shift and Rotate brute forcing. This should be optional as it will drastically increase the running time.
Attempt to convert data from various character encodings at each stage - does it match UTF8?
Add support for more languages
Allow the user to enter a crib pattern

…and some encoding types.

…d Base32.

…ntities, URL encoding, escaped Unicode, and Quoted Printable encoding.

… and Bzip2.

… BCD.

…ecodings.

…dings.

…ch language

…anguages by default, to lower false positives and improve performance.

…various simple encodings like XOR or bit rotates.

n1474335 · 2018-02-14T16:59:07Z

An example of the new extensive language support and intensive brute-forcing capabilities:

…e' even though their output cannot be analysed

…s tooltips explaining the properties.

n1474335 added 8 commits January 14, 2018 16:07

Added Magic operation with the ability to detect language, file type …

fc2828f

…and some encoding types.

Added detection patterns for non-standard Base64 alphabets, Base58 an…

a1624a9

…d Base32.

Added detection patterns for Octal, Binary, Decimal, Hexdumps, HTML E…

48f8ca6

…ntities, URL encoding, escaped Unicode, and Quoted Printable encoding.

Added detection patterns for UNIX timestamps, Zlib deflate, Gzip, Zip…

615a020

… and Bzip2.

Added detection patterns for X.509 certs, Morse Code, Tar, images and…

b035f6c

… BCD.

Merge branch 'master' into feature-magic

57314b7

Added speculative execution of recipes to determine the most likely d…

28abd00

…ecodings.

Magic operation now displays an ordered table of the most likely deco…

6947d2a

…dings.

n1474335 added the operation label Jan 23, 2018

n1474335 added 6 commits January 23, 2018 01:15

Magic operation tidying

865ee6a

Magic operation now detects UTF8 and gives a probability score for ea…

6624f25

…ch language

Magic operation now shows matching ops even if they are not run.

23bdfd0

Added support for 238 languages to the Magic operation.

2bc563b

The Magic operation now only checks the most commonly used Internet l…

544d78f

…anguages by default, to lower false positives and improve performance.

Added 'Intensive mode' to the Magic operation, where it brute-forces …

99ade42

…various simple encodings like XOR or bit rotates.

n1474335 added 5 commits February 14, 2018 17:00

Recipe errors are now ignored in the Magic operation

1760ab2

Magic operation now recognises useful operations such as 'Render Imag…

27ec4aa

…e' even though their output cannot be analysed

Magic operation now brute forces character encodings. Linted.

b3c52a8

Fixed a few small bugs

559741f

Magic operation now calculates the entropy of each option and display…

56d33ea

…s tooltips explaining the properties.

n1474335 added this to the v8.0.0 milestone May 14, 2018

Merged esm branch into feature-magic. Ported FileType ops.

ee519c7

n1474335 changed the base branch from master to esm May 20, 2018 15:50

n1474335 merged commit ee519c7 into esm May 20, 2018

n1474335 deleted the feature-magic branch November 13, 2018 17:05

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Magic' operation - automatically detect and run operations #239

'Magic' operation - automatically detect and run operations #239

n1474335 commented Jan 23, 2018 •

edited

Loading

n1474335 commented Feb 14, 2018

'Magic' operation - automatically detect and run operations #239

'Magic' operation - automatically detect and run operations #239

Conversation

n1474335 commented Jan 23, 2018 • edited Loading

Summary

Examples

Details

Magic bytes

Regular expressions to detect encodings

Byte frequency analysis

Future improvements

n1474335 commented Feb 14, 2018

n1474335 commented Jan 23, 2018 •

edited

Loading