Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiting options #2

Open
asg017 opened this issue Jun 4, 2024 · 2 comments
Open

Rate limiting options #2

asg017 opened this issue Jun 4, 2024 · 2 comments

Comments

@asg017
Copy link
Owner

asg017 commented Jun 4, 2024

Different providers have different limits, so might be hard to coordinate. Will also need settings to disable/alter limits for self-hosted options.

@simonw
Copy link

simonw commented Jul 25, 2024

Some providers even return HTTP headers that inform the client of the current rate limit remaining and when it would be reset, so it would be possible to automatically throttle requests to those providers (sleep automatically until the next "reset" point).

A call to the OpenAI embeddings API for example returns this:

x-ratelimit-limit-requests: 5000
x-ratelimit-limit-tokens: 5000000
x-ratelimit-remaining-requests: 4999
x-ratelimit-remaining-tokens: 4999990
x-ratelimit-reset-requests: 12ms
x-ratelimit-reset-tokens: 0s

@asg017
Copy link
Owner Author

asg017 commented Jul 25, 2024

Oooh interesting, that would be way better than trying to wrap some rust rate limiter around every client. Will score through all these clients and see which ones support that besides openai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants