Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak issue when using SimpleSearcher.search #1449

Open
kwang2049 opened this issue Feb 19, 2023 · 0 comments
Open

Memory leak issue when using SimpleSearcher.search #1449

kwang2049 opened this issue Feb 19, 2023 · 0 comments

Comments

@kwang2049
Copy link

kwang2049 commented Feb 19, 2023

Related issue #431

I also found using SimpleSearcher.search might lead to memory leak. I did an experiment of BM25 search with 5K queries over msmarco-v1-passage. With all Pyserini search results (top-1000) kept in RAM, it ends up with 8.43GB (much larger than it should be, ~1GB I think). What makes this more serious is about the memory leak, I found no matter what kind of memory cleaning I did, there would be still around 5.5GB remaining.

I used psutil.Process(os.getpid()).memory_info()[0] / 2**30 to track the RAM used. More details can be found in
https://colab.research.google.com/drive/1MUxe9RCpm-Ax2wF1agnUusXe6FF88wt-?usp=sharing

Interestingly, I found a fix:

  • Convert the Anserini search result into a Python one once each search is finished;
  • Limit the Java's -Xmx to a small amount (to make the Java-reserved-thus-leaked memory as small as possible).

Looking forward to your insights & ideas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant