Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unable to delete the persistent directory from another program due to PermissionError #2446

Open
axiangcoding opened this issue Jul 3, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@axiangcoding
Copy link

axiangcoding commented Jul 3, 2024

What happened?

I have two programmes, one build by fastapi, we call it server, and one for schedule tasks, we call it cronjob.
I'm using chromadb in server to create the data through chromadb sdk (wrapper by langchain-chromadb), and i have running the cronjob to clean the persist directory.

But when chromadb finishes executing normally in server, and then I delete the persist directory on the cronjob, error happened.

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'

code in server is as follow:

client = chromadb.PersistentClient(presist_path)
client.get_or_create_collection()
# ...do things here...

code in cronjob is as follow:

shutil.rmtree(presist_path)

This is very strange. It looks like some resources are not being released. Any idea of where should i start to debug it?

Versions

both service install the same version of chromadb

langchain-chroma = "^0.1.1"
chromadb = "^0.5.3"

Relevant log output

shutil.rmtree(folder_full_path)
    │      │      └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4'
    │      └ <function rmtree at 0x0000014FFD6C8C20><module 'shutil' from 'D:\\Program Files\\python311\\Lib\\shutil.py'>

  File "D:\Program Files\python311\Lib\shutil.py", line 759, in rmtree
    return _rmtree_unsafe(path, onerror)
           │              │     └ <function rmtree.<locals>.onerror at 0x0000014F901F7060>
           │              └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4'<function _rmtree_unsafe at 0x0000014FFD6C8AE0>
  File "D:\Program Files\python311\Lib\shutil.py", line 617, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
    │              │         └ <function rmtree.<locals>.onerror at 0x0000014F901F7060>
    │              └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1'<function _rmtree_unsafe at 0x0000014FFD6C8AE0>
  File "D:\Program Files\python311\Lib\shutil.py", line 622, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
    │       │  │       │         │   └ <built-in function exc_info>
    │       │  │       │         └ <module 'sys' (built-in)>
    │       │  │       └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'
    │       │  └ <built-in function unlink>
    │       └ <module 'os' (frozen)><function rmtree.<locals>.onerror at 0x0000014F901F7060>
  File "D:\Program Files\python311\Lib\shutil.py", line 620, in _rmtree_unsafe
    os.unlink(fullname)
    │  │      └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'
    │  └ <built-in function unlink><module 'os' (frozen)>

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'
@axiangcoding axiangcoding added the bug Something isn't working label Jul 3, 2024
@axiangcoding axiangcoding changed the title [Bug]: Unable to delete the persistent directory using another program [Bug]: Unable to delete the persistent directory from another program Jul 3, 2024
@axiangcoding
Copy link
Author

axiangcoding commented Jul 3, 2024

Looks like the issue has been mentioned before at #1009 (comment), and #1152
Could it only be the issue in Windows?

@axiangcoding axiangcoding changed the title [Bug]: Unable to delete the persistent directory from another program [Bug]: Unable to delete the persistent directory from another program due to PermissionError Jul 3, 2024
@tazarov
Copy link
Contributor

tazarov commented Jul 4, 2024

Hey @axiangcoding, you are hitting a Windows-related problem where (not necessarily your case) a Windows admin process (e.g., MS Defender) holds the file for a little while after another process has accessed it.

However, your case might be slightly different as your FastAPI can hold the file, given that your cronjob is a separate process. Before trying to run shutil.rmtree(presist_path) do you delete the given collection for which you're cleaning up the data (e.g. client.delete_collection("col_name")?

@axiangcoding
Copy link
Author

axiangcoding commented Jul 4, 2024

hi @tazarov , I have try client.delete_collection() before remove directory, but it only cleaned up the data in the sqlite database and didn't delete any files.

btw, i tried client.reset() too, didn't delete any files.

@tazarov
Copy link
Contributor

tazarov commented Jul 6, 2024

@axiangcoding, can you trace what process keeps the lock on the file? As an alternative, have you tried running things into a docker container with a volume instead of directory mount?

@axiangcoding
Copy link
Author

@axiangcoding, can you trace what process keeps the lock on the file? As an alternative, have you tried running things into a docker container with a volume instead of directory mount?

Just known that it is the fastapi process that keep the file lock. I'll try the other sugguestions later

@tazarov
Copy link
Contributor

tazarov commented Jul 7, 2024

@axiangcoding, thanks for confirming. That means that Chroma hasn't released the file yet.

@axiangcoding
Copy link
Author

axiangcoding commented Jul 19, 2024

hi @tazarov , I've tested it in k8s, mounted an azure-file to the persistent directory, both fastapi and cronjob pod can access it. And cronjob pod still could not delete the directory. (same as docker volume i think)

error has changed.

Directory not empty

I also try the rm -rf command, same error too. And I need to restart the fastapi pod several times, so i can delete it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants