How to handle RateLimit error? #101

jacktang · 2024-07-02T12:11:52Z

Hello,

First of all, thanks for the Groq platform! I use it as llm backend to create agent. But it often comes up RateLimit error. I handled like this:

    def inference(self, model: str, messages: list, max_retries:int=5, max_tokens=8000) -> str:
        for i in range(max_retries):
            try:        
                chat_completion = self.client.chat.completions.create(
                    messages=messages,
                    model=model,
                    # temperature=0,
                    max_tokens=max_tokens,
                )
                return chat_completion.choices[0].message.content
            except Exception as e:
                if e.response.status_code == 429:
                    retry_after = e.response.headers['retry-after'] 
                    time.sleep(int(retry_after) + 10)

But it never works. I have to reconnect it and it works. So how to handle this error?

The text was updated successfully, but these errors were encountered:

jacktang · 2024-07-02T14:26:03Z

Well, I found it nothing related to Rate but the total size of response. If the threshold of max length of response is reached, nothing return again unless reconnect. Is that a bug?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle RateLimit error? #101

How to handle RateLimit error? #101

jacktang commented Jul 2, 2024

jacktang commented Jul 2, 2024

How to handle RateLimit error? #101

How to handle RateLimit error? #101

Comments

jacktang commented Jul 2, 2024

jacktang commented Jul 2, 2024