You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster
Environment
version: langchain-google-vertexai==1.0.6
Python version: python 3.11.9
Operating System: Windows 11 64bit
Description
I'm experiencing extremely slow performance when initializing and invoking the ChatVertexAI model, specifically with the Gemini 1.5 Flash model. The initialization takes between 25 to 45 seconds, and each invocation takes 15 to 35 seconds. This seems unusually slow for a model advertised as fast. Interestingly, when using the VertexAI class instead, the initialization is still slow, but the generation process is much faster.
Steps to Reproduce
Install the required package: pip install langchain-google-vertexai==1.0.6
Given that Gemini 1.5 Flash is advertised as a fast model, I would expect both initialization and invocation to be significantly quicker, perhaps in the range of a few seconds at most. The behavior observed with the VertexAI class (fast generation after slow initialization) seems more in line with expectations.
Actual Behavior
For ChatVertexAI:
Initialization takes 25-45 seconds
Each invocation takes 15-35 seconds
For VertexAI:
Initialization takes 25-45 seconds
Each invocation takes 2-5 seconds
Question
Why is there such a significant difference in invocation speed between ChatVertexAI and VertexAI when using the same Gemini 1.5 Flash model?
Is the slow initialization expected for both classes? If not, are there any known issues or optimizations that could improve the initialization time?
Are there any recommended settings or best practices for using ChatVertexAI with Gemini 1.5 Flash to achieve optimal speed, similar to what's seen with VertexAI?
Additional Context
I'm located in Japan, so I'm using the asia-northeast1 location for the model, which should be optimal for my geographical location.
Despite using the optimal location, I'm still experiencing these long delays with ChatVertexAI.
The fact that VertexAI performs faster for generation suggests that the issue might be specific to the ChatVertexAI implementation.
The text was updated successfully, but these errors were encountered:
No matter how many times I try, in my environment there's a clear difference in latency between VertexAI and ChatVertexAI. It might just be my environment. Thank you very much!
Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster
Environment
Description
I'm experiencing extremely slow performance when initializing and invoking the ChatVertexAI model, specifically with the Gemini 1.5 Flash model. The initialization takes between 25 to 45 seconds, and each invocation takes 15 to 35 seconds. This seems unusually slow for a model advertised as fast. Interestingly, when using the VertexAI class instead, the initialization is still slow, but the generation process is much faster.
Steps to Reproduce
pip install langchain-google-vertexai==1.0.6
Expected Behavior
Given that Gemini 1.5 Flash is advertised as a fast model, I would expect both initialization and invocation to be significantly quicker, perhaps in the range of a few seconds at most. The behavior observed with the VertexAI class (fast generation after slow initialization) seems more in line with expectations.
Actual Behavior
For ChatVertexAI:
For VertexAI:
Question
Additional Context
asia-northeast1
location for the model, which should be optimal for my geographical location.The text was updated successfully, but these errors were encountered: