-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BPE memory leak #39
Comments
Interesting find! Do you have a code snippet we can reproduce this issue with? |
Hi @venual, I'm following up to see if this is still an issue |
We also see this in production, investigating @zurawiki There should be a way to re-use the BPE for sure in any case. |
We did some investigation and the lib does a lot of allocations, specially around regexes (fancy_regex) in _encode_native. Given a 1-2MB input it can easily bubble up to 50mb of RAM usage. Just looking at the code I see a couple of quick wins (example: the transformation of messages from OpenAI types to your types are clones instead of a ref). For sure it would be better to use a single BPE instance too, but I don't think this is the primary memory problem. |
Thanks for the analysis @Sytten. Is there a specific code snipped we can use to build a regression test to make sure memory stays under a reasonable amount? How are you testing memory usage? For follow-ups, it looks like we should:
|
We did mainly manual tests with a codepath that had |
So from an initial analysis (see the dhat-heap file below), I confirm time is spent in _encode_native specifically around matching the There are performance notes noting that regex takes a fair bit of CPU. I noticed that the regex used in the original tiktoken library used the negative lookahead operator
We could try to switch to the Note that in case you want to profile memory tests, I pushed a commit with an example that can be run with cargo run --example num_tokens_memory --features dhat-heap --release |
I dont know if im using it wrong but when creating a new BPE it creates around 20MB of memory and never releases it, on top of that the async_openai::get_max_tokens_chat_message function creates a new bpe in it so big memory usage that never releases after every call
The text was updated successfully, but these errors were encountered: