Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster #402

Closed
stackpiles-naka opened this issue Jul 25, 2024 · 2 comments

Comments

@stackpiles-naka
Copy link

Slow initialization and invocation of ChatVertexAI with Gemini 1.5 Flash, while VertexAI performs faster

Environment

  • version: langchain-google-vertexai==1.0.6
  • Python version: python 3.11.9
  • Operating System: Windows 11 64bit

Description

I'm experiencing extremely slow performance when initializing and invoking the ChatVertexAI model, specifically with the Gemini 1.5 Flash model. The initialization takes between 25 to 45 seconds, and each invocation takes 15 to 35 seconds. This seems unusually slow for a model advertised as fast. Interestingly, when using the VertexAI class instead, the initialization is still slow, but the generation process is much faster.

Steps to Reproduce

  1. Install the required package: pip install langchain-google-vertexai==1.0.6
  2. Run the following code for ChatVertexAI:
from langchain_google_vertexai import ChatVertexAI
import time

start = time.time()
llm = ChatVertexAI(
    model_name="gemini-1.5-flash-001",
    location="asia-northeast1",
)
print(f'Loaded LLM in {time.time() - start} seconds') # 25 sec ~ 45 sec
start = time.time()
response = llm.invoke("HI!")
print(f'Invoked LLM in {time.time() - start} seconds') # 15 sec ~ 35 sec
print(response)
  1. Compare with VertexAI class (please provide the exact code you used for this comparison):
from langchain_google_vertexai import VertexAI
import time

start = time.time()
llm = VertexAI(
    model_name="gemini-1.5-flash-001",
    location="asia-northeast1",
)
print(f'Loaded LLM in {time.time() - start} seconds') # 25 sec ~ 45 sec
start = time.time()
response = llm.invoke("HI!")
print(f'Invoked LLM in {time.time() - start} seconds') # 2 sec ~ 5 sec
print(response)

Expected Behavior

Given that Gemini 1.5 Flash is advertised as a fast model, I would expect both initialization and invocation to be significantly quicker, perhaps in the range of a few seconds at most. The behavior observed with the VertexAI class (fast generation after slow initialization) seems more in line with expectations.

Actual Behavior

For ChatVertexAI:

  • Initialization takes 25-45 seconds
  • Each invocation takes 15-35 seconds

For VertexAI:

  • Initialization takes 25-45 seconds
  • Each invocation takes 2-5 seconds

Question

  1. Why is there such a significant difference in invocation speed between ChatVertexAI and VertexAI when using the same Gemini 1.5 Flash model?
  2. Is the slow initialization expected for both classes? If not, are there any known issues or optimizations that could improve the initialization time?
  3. Are there any recommended settings or best practices for using ChatVertexAI with Gemini 1.5 Flash to achieve optimal speed, similar to what's seen with VertexAI?

Additional Context

  • I'm located in Japan, so I'm using the asia-northeast1 location for the model, which should be optimal for my geographical location.
  • Despite using the optimal location, I'm still experiencing these long delays with ChatVertexAI.
  • The fact that VertexAI performs faster for generation suggests that the issue might be specific to the ChatVertexAI implementation.
@lkuligin
Copy link
Collaborator

lkuligin commented Aug 11, 2024

Could you check again, please?
I can't observe the issue anymore (anyways, I don't think it's on the integration's side).

@stackpiles-naka
Copy link
Author

No matter how many times I try, in my environment there's a clear difference in latency between VertexAI and ChatVertexAI. It might just be my environment. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants