Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster #402

stackpiles-naka · 2024-07-25T13:55:12Z

Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster

Environment

version: langchain-google-vertexai==1.0.6
Python version: python 3.11.9
Operating System: Windows 11 64bit

Description

I'm experiencing extremely slow performance when initializing and invoking the ChatVertexAI model, specifically with the Gemini 1.5 Flash model. The initialization takes between 25 to 45 seconds, and each invocation takes 15 to 35 seconds. This seems unusually slow for a model advertised as fast. Interestingly, when using the VertexAI class instead, the initialization is still slow, but the generation process is much faster.

Steps to Reproduce

Install the required package: pip install langchain-google-vertexai==1.0.6
Run the following code for ChatVertexAI:

from langchain_google_vertexai import ChatVertexAI
import time

start = time.time()
llm = ChatVertexAI(
    model_name="gemini-1.5-flash-001",
    location="asia-northeast1",
)
print(f'Loaded LLM in {time.time() - start} seconds') # 25 sec ~ 45 sec
start = time.time()
response = llm.invoke("HI!")
print(f'Invoked LLM in {time.time() - start} seconds') # 15 sec ~ 35 sec
print(response)

Compare with VertexAI class (please provide the exact code you used for this comparison):

from langchain_google_vertexai import VertexAI
import time

start = time.time()
llm = VertexAI(
    model_name="gemini-1.5-flash-001",
    location="asia-northeast1",
)
print(f'Loaded LLM in {time.time() - start} seconds') # 25 sec ~ 45 sec
start = time.time()
response = llm.invoke("HI!")
print(f'Invoked LLM in {time.time() - start} seconds') # 2 sec ~ 5 sec
print(response)

Expected Behavior

Given that Gemini 1.5 Flash is advertised as a fast model, I would expect both initialization and invocation to be significantly quicker, perhaps in the range of a few seconds at most. The behavior observed with the VertexAI class (fast generation after slow initialization) seems more in line with expectations.

Actual Behavior

For ChatVertexAI:

Initialization takes 25-45 seconds
Each invocation takes 15-35 seconds

For VertexAI:

Initialization takes 25-45 seconds
Each invocation takes 2-5 seconds

Question

Why is there such a significant difference in invocation speed between ChatVertexAI and VertexAI when using the same Gemini 1.5 Flash model?
Is the slow initialization expected for both classes? If not, are there any known issues or optimizations that could improve the initialization time?
Are there any recommended settings or best practices for using ChatVertexAI with Gemini 1.5 Flash to achieve optimal speed, similar to what's seen with VertexAI?

Additional Context

I'm located in Japan, so I'm using the asia-northeast1 location for the model, which should be optimal for my geographical location.
Despite using the optimal location, I'm still experiencing these long delays with ChatVertexAI.
The fact that VertexAI performs faster for generation suggests that the issue might be specific to the ChatVertexAI implementation.

The text was updated successfully, but these errors were encountered:

lkuligin · 2024-08-11T10:18:10Z

Could you check again, please?
I can't observe the issue anymore (anyways, I don't think it's on the integration's side).

stackpiles-naka · 2024-08-15T05:28:13Z

No matter how many times I try, in my environment there's a clear difference in latency between VertexAI and ChatVertexAI. It might just be my environment. Thank you very much!

stackpiles-naka closed this as completed Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster #402

Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster #402

stackpiles-naka commented Jul 25, 2024

lkuligin commented Aug 11, 2024 •

edited

Loading

stackpiles-naka commented Aug 15, 2024

Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster #402

Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster #402

Comments

stackpiles-naka commented Jul 25, 2024

Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster

Environment

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Question

Additional Context

lkuligin commented Aug 11, 2024 • edited Loading

stackpiles-naka commented Aug 15, 2024

lkuligin commented Aug 11, 2024 •

edited

Loading