Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for files up to 500MB #224

Merged
merged 16 commits into from
Dec 28, 2017
Merged

Support for files up to 500MB #224

merged 16 commits into from
Dec 28, 2017

Conversation

n1474335
Copy link
Member

@n1474335 n1474335 commented Dec 27, 2017

This PR introduces support for files up to around 500MB in size (depending on your browser).

  • Files are loaded into the browser in chunks via a web worker, meaning that the UI remains usable during the process. This process is very fast, with a 500MB file taking less than 10 seconds to load on my system (Linux running in a VM with limited CPU and memory).
  • A new ArrayBuffer data type has been added to the Dish so that large files can be held in their native format without the need to cast to a byteArray or string unless required by a specific operation.
  • Any output larger than 1MB is treated as a file and not displayed in its entirety. Instead, an overlay card is shown with options to either download the file, or view a slice of it in the output. This 1MB threshold can be changed in the options pane.

These changes vastly improve the stability of the web app when dealing with large amounts of data. The main issues previously revolved around having to render huge amounts of text in the DOM, which browsers aren't particularly good at. This solution does not render the actual content unless it is specifically asked for.

The introduction of ArrayBuffer as a Dish type allows for operations to treat the input as a typed array. Over time, we will move several operations over to support ArrayBuffers instead of byteArrays where this makes sense. In this PR, only the 'Detect File Type' and 'Scan for Embedded Files' operations support the ArrayBuffer Dish type. This means they can run over files without having to cast the data at all. During testing, 'Detect File Type' was run over a 500MB file in less than a millisecond. Other operations may take a longer time to run over large files, although the MD5 operation executes in a reasonable time on files up to about 100MB.

A 500MB file loaded, ingested and ready to be downloaded again
screenshot from 2017-12-27 16-11-30

Running the 'Detect File Type' operation over a 500MB file full of random data (hence no detected magic bytes). Note the processing time of 0ms.
screenshot from 2017-12-27 16-11-54

Running the 'Detect File Type' operation on a JPG file
screenshot from 2017-12-27 16-20-15

Viewing the first 1024 bytes of a JPG file. The icon in the top right of the output area allows you to pull up the file overlay again and select a different slice of the file for viewing.
screenshot from 2017-12-27 16-13-16

In addition to these changes, this PR also removes CryptoJS as a dependency of Utils.js. The much smaller and better maintained utf8 library is used instead. CryptoJS is still used for the cipher operations, but removing it from Utils.js means it does not have to be loaded unless the Cipher module is requested.

@n1474335 n1474335 merged commit 75a554e into master Dec 28, 2017
@n1474335 n1474335 deleted the feature-files branch December 28, 2017 16:08
BRAVO68WEB pushed a commit to BRAVO68WEB/CyberChef that referenced this pull request May 29, 2022
[FEATURE] Multi-Search with Custom Bangs
Fixes gchq#206
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant