Fix: Resolves memory leak caused by using CRAFT detector with detect() or readtext(). #1278

daniellovera · 2024-07-10T20:52:16Z

This fix enables garbage collection to appropriately work when

Line 24 in c999505

    
           def test_net(canvas_size, mag_ratio, net, image, text_threshold, link_threshold, low_text, poly, device, estimate_num_chars=False):

returns, by deleting the objects we moved to the GPU after we move the forward pass results back on the CPU.

See https://pytorch.org/blog/understanding-gpu-memory-2/#why-doesnt-automatic-garbage-collection-work for more detail.

Running torch.cuda.empty_cache() in test_net() before returning allows nvidia-smi to be accurate.

Interestingly, nvidia-smi showed that GPU memory usage per process was 204MiB upon reader initialization, and then would increase to 234MiB or 288MiB after running easyocr.reader.detect(), but then not increase beyond that point and in some cases reduce back down to 234MiB. I think this has something to do with

One note is that I tested this on a single GPU machine where I changed

EasyOCR/easyocr/detection.py

Line 86 in c999505

net = torch.nn.DataParallel(net).to(device)

to be net = net.to(device), removing DataParallel. There's no reason this shouldn't work on multi-GPU machines, but noting it wasn't tested on one.

I also only tested this on the CRAFT detector, not DBNet.

Relevant package versions
easyocr version 1.7.1
torch version 2.2.1+cu121
torchvision 0.17.1+cu121

Hope this helps!

daniellovera · 2024-07-14T17:40:52Z

I should clarify, this resolves GPU vRAM memory leaks. It's not resolving the CPU RAM memory leaks.

daniellovera · 2024-07-16T04:27:24Z

Corrected to only call empty_cache() if the device in use is cuda.

jonashaag · 2024-08-07T12:12:19Z

The del stuff can't possibly work. It just removes the Python variable from the scope (the function) but doesn't actually remove anything from the GPU/CPU

daniellovera · 2024-08-07T19:06:06Z

The del stuff can't possibly work. It just removes the Python variable from the scope (the function) but doesn't actually remove anything from the GPU/CPU

@jonashaag did you attempt to replicate my results? It'll take you less than 15 minutes to give it a whirl and prove if it's possible or not.

Because it did work for me, and the pytorch.org blog post I linked provides the reasoning for exactly why it does work. I'll quote here:

Why doesn’t automatic garbage collection work?
The automatic garbage collection works well when there is a lot of extra memory as is common on CPUs because it amortizes the expensive garbage collection by using Generational Garbage Collection. But to amortize the collection work, it defers some memory cleanup making the maximum memory usage higher, which is less suited to memory constrained environments. The Python runtime also has no insights into CUDA memory usage, so it cannot be triggered on high memory pressure either. It’s even more challenging as GPU training is almost always memory constrained because we will often raise the batch size to use any additional free memory.

The CPython’s garbage collection frees unreachable objects held in reference cycles via the mark-and-sweep. The garbage collection is automatically run when the number of objects exceeds certain thresholds. There are 3 generations of thresholds to help amortize the expensive costs of running garbage collection on every object. The later generations are less frequently run. This would explain why automatic collections will only clear several tensors on each peak, however there are still tensors that leak resulting in the CUDA OOM. Those tensors were held by reference cycles in later generations.

I'm not going to claim that I think it SHOULD work this way. But this isn't the first time some weird garbage collection and scoping issues across CPU/GPUs caused issues.

Again, try it and let us all know if it's actually working for you or not.

jonashaag · 2024-08-07T20:19:00Z

Sorry, maybe I misunderstood the reason why del is used here. Is it so that the call to empty_cache() can remove the tensors x, y, feature from GPU memory? That might work unless there are other references to the tensors that those variables reference.

daniellovera · 2024-08-07T23:45:36Z

Sorry, maybe I misunderstood the reason why del is used here. Is it so that the call to empty_cache() can remove the tensors x, y, feature from GPU memory? That might work unless there are other references to the tensors that those variables reference.

I don't think I understand it well enough to explain it better. I also call torch.empty_cache() and torch.cuda.reset_peak_memory_stats() after the function returns. It's possible that the empty_cache() call inside the function isn't actually doing anything since the GC doesn't run until the function goes out of scope - I probably should have double checked that but I was less concerned with nvidia-smi being accurate as I was not getting CUDA OOM errors.

I'm far from an expert, but I do know that these changes resulted in halting the memory leaks I had, and I haven't had a CUDA OOM error since.

Best suggestion is that since action produces information, you give it a whirl and let us know if it works. If it doesn't work for you, then that's valuable for me to know how your machine is different than mine, so I can make further changes to avoid getting these errors again if I scale-up or swap machines.

daniellovera · 2024-08-13T19:53:25Z

@jonashaag Hey, I'd love to know if del worked if you tried it.

jonashaag · 2024-08-13T20:45:24Z

Sorry, I've switched to another engine (macOS Live Text) because it's better and much faster.

I feel a bit bad to have left such a smart-ass comment initially and not contribute anything of substance here :-/

daniellovera · 2024-08-14T20:52:49Z

It's all good. Are you using Live Text natively on the devices or can it be hosted in a way that allows it to replace EasyOCR for serving a website that's not on an Apple device?

jonashaag · 2024-08-15T04:30:44Z

Yes we run a Mac mini in production (via Scaleway)

If you are interested I can share some code

Resolves memory leak on detection.

e992112

daniellovera mentioned this pull request Jul 13, 2024

Easyocr memory leak #815

Open

Adjusted to not call empty_cache() for non-Cuda

62773de

Correct cuda check

bc53fc1

bjou-persona mentioned this pull request Aug 19, 2024

fix: resolves memory leak caused by using CRAFT detector with detect() or readtext() persona-id/EasyOCR#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Resolves memory leak caused by using CRAFT detector with detect() or readtext(). #1278

Fix: Resolves memory leak caused by using CRAFT detector with detect() or readtext(). #1278

daniellovera commented Jul 10, 2024

daniellovera commented Jul 14, 2024

daniellovera commented Jul 16, 2024

jonashaag commented Aug 7, 2024

daniellovera commented Aug 7, 2024

jonashaag commented Aug 7, 2024

daniellovera commented Aug 7, 2024

daniellovera commented Aug 13, 2024

jonashaag commented Aug 13, 2024

daniellovera commented Aug 14, 2024

jonashaag commented Aug 15, 2024

Fix: Resolves memory leak caused by using CRAFT detector with detect() or readtext(). #1278

Are you sure you want to change the base?

Fix: Resolves memory leak caused by using CRAFT detector with detect() or readtext(). #1278

Conversation

daniellovera commented Jul 10, 2024

daniellovera commented Jul 14, 2024

daniellovera commented Jul 16, 2024

jonashaag commented Aug 7, 2024

daniellovera commented Aug 7, 2024

jonashaag commented Aug 7, 2024

daniellovera commented Aug 7, 2024

daniellovera commented Aug 13, 2024

jonashaag commented Aug 13, 2024

daniellovera commented Aug 14, 2024

jonashaag commented Aug 15, 2024