Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle RateLimit error? #101

Open
jacktang opened this issue Jul 2, 2024 · 1 comment
Open

How to handle RateLimit error? #101

jacktang opened this issue Jul 2, 2024 · 1 comment

Comments

@jacktang
Copy link

jacktang commented Jul 2, 2024

Hello,

First of all, thanks for the Groq platform! I use it as llm backend to create agent. But it often comes up RateLimit error. I handled like this:

    def inference(self, model: str, messages: list, max_retries:int=5, max_tokens=8000) -> str:
        for i in range(max_retries):
            try:        
                chat_completion = self.client.chat.completions.create(
                    messages=messages,
                    model=model,
                    # temperature=0,
                    max_tokens=max_tokens,
                )
                return chat_completion.choices[0].message.content
            except Exception as e:
                if e.response.status_code == 429:
                    retry_after = e.response.headers['retry-after'] 
                    time.sleep(int(retry_after) + 10)  

But it never works. I have to reconnect it and it works. So how to handle this error?

@jacktang
Copy link
Author

jacktang commented Jul 2, 2024

Well, I found it nothing related to Rate but the total size of response. If the threshold of max length of response is reached, nothing return again unless reconnect. Is that a bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant