-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize UFDR evidences opening time through some networks #1694
Comments
Is the observed slowness every time a new item is loaded, or just during the first item / case opening? I just made 4 cases available through a network share (1~3 UFDRs in each), and it seems fine. We are still using our "old" server/storage, so some slowness are kind of expected, but these particular cases seem pretty responsive. |
Well, it is kind of obvious, but the reported slowness only happens with UFDRs, right? The UFDRs I tested here are not that large (the largest one has ~200 GB and contains ~370K files). |
Yes, he said HDD cases open much faster than UFDRs ones. I asked @arisjr to answer the other questions directly here. |
"If" we decide to switch to Zip4J again, seems they implemented similar optimizations I made in my patch on this PR: |
I do process it in Linux, but the analysts are on a Windows client (RDS server), using the processed evidences on a samba share over the network.
It's just when the first item content is showed. The case opens well, you can browse fine, search for items, all very fast, but when you click the desired item to see its contents, then the unresponsiveness happens.
We need far less than 200GB UFDRs to see this scenario. 50GB UFDRs are capable to generate this "first item content" problem. It freezes the IPED interface until it is struggling to show the first item content. When the first item is showed, everything comes back to normal, with high responsiveness to all other items. Generally speaking, this 50GB is copied rapidly over the network to the local machine (to the Remote Desktop), and this time is 1/4 the time it takes to open the first item content. UPDATE: this last part is just non measured perception, I will calculate it better tomorrow |
I see. In the cases I mentioned, the first item to be shown does take longer, but it is something around 5-10 seconds. |
Your samba share is on top of a Ceph distributed FS, right? Have you already measured random seek IO performance of your environment setup? |
@arisjr if you could take a few thread dumps when the app s hanging (jstack -l pid), it may help us discover what is causing the slowness on your environment. |
That's correct.
Not that I can recall. @lfcnassif Can you remember me which was the last version that has the earlier implementation? I will do some testing in here. I can also open my infrastructure for you two to test some scenarios. |
For sure! Will do that early in the morning. |
It was 3.18.15 by the release notes: https://github.com/sepinf-inc/IPED/blob/master/ReleaseNotes.txt#L283 |
@lfcnassif I sent you over private channel, cause dragging and dropping files is not working on messages now. |
Thanks. The slowness really comes from Apache Commons Compress when opening the ZIP (reading its central directory). I tried 4 different approaches trying to speed up. The last one is promising and brought up to 3x speed up in my local network (where the opening speed was already good). Basically it passes to Commons Compress ZipFile constructor a new SeekableByteChannel implementation over a RandomAccessFile, and uses the BufferedRandomAccessFile implementation from Apache PDFBox/FontBox project. I sent the patch to @arisjr for testing. |
PS: It will only work with single segment UFDR files. |
@arisjr reported privately the enhancement worked for him: first item opening time decreased from 17min to 5s! A buffer of 64KB was used. So I'll submit a PR with the proposal. @tc-wleite I would appreciate a lot if you could take a quick look at the upcoming PR, because it changes UFDR reading at a low level and must be free of bugs, otherwise we may have very bad side effects... |
Awesome!
Sure! |
PS: I'm really surprised java 11 FileChannel implementation doesn't buffer the data internally, seems it delegates the buffering to OS and maybe the distributed storage used by @arisjr is affecting the client OS buffering strategy. |
On #1068 we switched from Zip4j to Apache Commons Compress because of Zip4j-1.x vulnerabilities causing dependabot to bother us. Furthermore, I had to patch zip4j a long ago to speed up zip opening/reading, that usually makes upgrading difficult and I would like to ged rid of my patch...
As far as I remember, Commons Compress provided higher processing throughput, but slower UFDR opening at analysis time when user clicks on the first item in the analysis UI. Using Commons Compress
ignoreLocalFileHeader = true
option in ZipFile constructor made UFDR opening a lot faster and not significantly slower than the patched Zip4j-1.x, in my opinion that time, but I tested just with local UFDR files.However, a Linux user is complaining UFDR evidence opening from UI became very very slow when the case is located in a network share, to the point users are using another tool to analyze the cases. I didn't test the change with cases on network shares that time. Is anyone else experiencing the same slowness?
The text was updated successfully, but these errors were encountered: