Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize UFDR evidences opening time through some networks #1694

Closed
lfcnassif opened this issue May 22, 2023 · 17 comments · Fixed by #1695
Closed

Optimize UFDR evidences opening time through some networks #1694

lfcnassif opened this issue May 22, 2023 · 17 comments · Fixed by #1695
Assignees

Comments

@lfcnassif
Copy link
Member

lfcnassif commented May 22, 2023

On #1068 we switched from Zip4j to Apache Commons Compress because of Zip4j-1.x vulnerabilities causing dependabot to bother us. Furthermore, I had to patch zip4j a long ago to speed up zip opening/reading, that usually makes upgrading difficult and I would like to ged rid of my patch...

As far as I remember, Commons Compress provided higher processing throughput, but slower UFDR opening at analysis time when user clicks on the first item in the analysis UI. Using Commons Compress ignoreLocalFileHeader = true option in ZipFile constructor made UFDR opening a lot faster and not significantly slower than the patched Zip4j-1.x, in my opinion that time, but I tested just with local UFDR files.

However, a Linux user is complaining UFDR evidence opening from UI became very very slow when the case is located in a network share, to the point users are using another tool to analyze the cases. I didn't test the change with cases on network shares that time. Is anyone else experiencing the same slowness?

@wladimirleite
Copy link
Member

However, a Linux user is complaining UFDR evidence opening from UI became very very slow when the case in located in a network share, to the point users are using another tool to analyze the cases. I didn't test the change with cases on network shares that time. Is anyone else experiencing the same slowness?

Is the observed slowness every time a new item is loaded, or just during the first item / case opening?

I just made 4 cases available through a network share (1~3 UFDRs in each), and it seems fine. We are still using our "old" server/storage, so some slowness are kind of expected, but these particular cases seem pretty responsive.

@wladimirleite
Copy link
Member

wladimirleite commented May 22, 2023

Well, it is kind of obvious, but the reported slowness only happens with UFDRs, right?

The UFDRs I tested here are not that large (the largest one has ~200 GB and contains ~370K files).
Is the slowness happening with any UFDR or just larger ones?

@lfcnassif
Copy link
Member Author

Well, it is kind of obvious, but the reported slowness only happens with UFDRs, right?

Yes, he said HDD cases open much faster than UFDRs ones. I asked @arisjr to answer the other questions directly here.

@lfcnassif
Copy link
Member Author

lfcnassif commented May 22, 2023

"If" we decide to switch to Zip4J again, seems they implemented similar optimizations I made in my patch on this PR:
srikanth-lingala/zip4j#457

@arisjr
Copy link
Contributor

arisjr commented May 22, 2023

However, a Linux user is complaining UFDR evidence opening from UI became very very slow when the case in located in a network share, to the point users are using another tool to analyze the cases. I didn't test the change with cases on network shares that time. Is anyone else experiencing the same slowness?

I do process it in Linux, but the analysts are on a Windows client (RDS server), using the processed evidences on a samba share over the network.

Is the observed slowness every time a new item is loaded, or just during the first item / case opening?

It's just when the first item content is showed. The case opens well, you can browse fine, search for items, all very fast, but when you click the desired item to see its contents, then the unresponsiveness happens.

I just made 4 cases available through a network share (1~3 UFDRs in each), and it seems fine. We are still using our "old" server/storage, so some slowness are kind of expected, but these particular cases seem pretty responsive.

We need far less than 200GB UFDRs to see this scenario. 50GB UFDRs are capable to generate this "first item content" problem. It freezes the IPED interface until it is struggling to show the first item content. When the first item is showed, everything comes back to normal, with high responsiveness to all other items.

Generally speaking, this 50GB is copied rapidly over the network to the local machine (to the Remote Desktop), and this time is 1/4 the time it takes to open the first item content.

UPDATE: this last part is just non measured perception, I will calculate it better tomorrow

@wladimirleite
Copy link
Member

I see. In the cases I mentioned, the first item to be shown does take longer, but it is something around 5-10 seconds.

@lfcnassif
Copy link
Member Author

using the processed evidences on a samba share over the network.

Your samba share is on top of a Ceph distributed FS, right? Have you already measured random seek IO performance of your environment setup?

@lfcnassif
Copy link
Member Author

@arisjr if you could take a few thread dumps when the app s hanging (jstack -l pid), it may help us discover what is causing the slowness on your environment.

@arisjr
Copy link
Contributor

arisjr commented May 23, 2023

Your samba share is on top of a Ceph distributed FS, right?

That's correct.

Have you already measured random seek IO performance of your environment setup?

Not that I can recall.

@lfcnassif Can you remember me which was the last version that has the earlier implementation? I will do some testing in here. I can also open my infrastructure for you two to test some scenarios.

@arisjr
Copy link
Contributor

arisjr commented May 23, 2023

@arisjr if you could take a few thread dumps when the app s hanging (jstack -l pid), it may help us discover what is causing the slowness on your environment.

For sure! Will do that early in the morning.

@lfcnassif
Copy link
Member Author

@lfcnassif Can you remember me which was the last version that has the earlier implementation?

It was 3.18.15 by the release notes: https://github.com/sepinf-inc/IPED/blob/master/ReleaseNotes.txt#L283

@arisjr
Copy link
Contributor

arisjr commented May 23, 2023

@arisjr if you could take a few thread dumps when the app s hanging (jstack -l pid), it may help us discover what is causing the slowness on your environment.

@lfcnassif I sent you over private channel, cause dragging and dropping files is not working on messages now.

@lfcnassif
Copy link
Member Author

lfcnassif commented May 24, 2023

@lfcnassif I sent you over private channel, cause dragging and dropping files is not working on messages now.

Thanks. The slowness really comes from Apache Commons Compress when opening the ZIP (reading its central directory).

I tried 4 different approaches trying to speed up. The last one is promising and brought up to 3x speed up in my local network (where the opening speed was already good). Basically it passes to Commons Compress ZipFile constructor a new SeekableByteChannel implementation over a RandomAccessFile, and uses the BufferedRandomAccessFile implementation from Apache PDFBox/FontBox project. I sent the patch to @arisjr for testing.

@lfcnassif
Copy link
Member Author

PS: It will only work with single segment UFDR files.

@lfcnassif lfcnassif self-assigned this May 24, 2023
@lfcnassif
Copy link
Member Author

lfcnassif commented May 24, 2023

@arisjr reported privately the enhancement worked for him: first item opening time decreased from 17min to 5s! A buffer of 64KB was used. So I'll submit a PR with the proposal. @tc-wleite I would appreciate a lot if you could take a quick look at the upcoming PR, because it changes UFDR reading at a low level and must be free of bugs, otherwise we may have very bad side effects...

@lfcnassif lfcnassif changed the title Re-evaluate the library used to open UFDR evidences Optimize UFDR evidences opening time May 24, 2023
@wladimirleite
Copy link
Member

@arisjr reported privately the enhancement worked for him: first item opening time descreased from 17min to 5s!

Awesome!

So I'll submit a PR with the proposal. @tc-wleite I would appreciate a lot if you could take a quick look at the upcoming PR, because it affects the UFDR reading at a low level and must be free of bugs, otherwise we may have bad side effects...

Sure!

@lfcnassif
Copy link
Member Author

PS: I'm really surprised java 11 FileChannel implementation doesn't buffer the data internally, seems it delegates the buffering to OS and maybe the distributed storage used by @arisjr is affecting the client OS buffering strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants