Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQueryVectorStore error when retrieving docs with filters (PR proposal) #426

Closed
Freezaa9 opened this issue Aug 7, 2024 · 0 comments · Fixed by #429
Closed

BigQueryVectorStore error when retrieving docs with filters (PR proposal) #426

Freezaa9 opened this issue Aug 7, 2024 · 0 comments · Fixed by #429

Comments

@Freezaa9
Copy link
Contributor

Freezaa9 commented Aug 7, 2024

Hi,
When I follow this I can retrieve my documents with filters.
https://python.langchain.com/v0.2/docs/integrations/vectorstores/google_vertex_ai_vector_search/

But if I instanciate my existing BigQueryVectorStore without adding text in the same environment I cannot retrieve document with filters.
Example:
Create a BigQueryVectorStore and use add_text to add documents.
In a saparete notebook use:

from langchain_google_vertexai import VertexAIEmbeddings

embedding = VertexAIEmbeddings(
    model_name="textembedding-gecko@latest", project=PROJECT_ID
)
from langchain_google_community import BigQueryVectorStore

store = BigQueryVectorStore(
    project_id=PROJECT_ID,
    dataset_name=DATASET,
    table_name=TABLE,
    location=REGION,
    embedding=embedding,
)
docs = store.similarity_search_by_vector(query_vector, filter={"len": 6})
print(docs)

I get the error:

File c:\Users\geoff\OneDrive\Documents\GitHub\pcd-data-eu-genai-diorastra-orch\.venv\Lib\site-packages\langchain_google_community\bq_storage_vectorstores\_base.py:387, in BaseBigQueryVectorStore.similarity_search_by_vectors(self, embeddings, filter, k, with_scores, with_embeddings, **kwargs)
...
--> [240](file:///C:/Users/XXXX/OneDrive/Documents/GitHub/XXXX/.venv/Lib/site-packages/langchain_google_community/bq_storage_vectorstores/bigquery.py:240)     if self.table_schema[column] in ["INTEGER", "FLOAT"]:  # type: ignore[index]
    [241](file:///C:/Users/XXXX/OneDrive/Documents/GitHub/XXXX/.venv/Lib/site-packages/langchain_google_community/bq_storage_vectorstores/bigquery.py:241)         filter_expressions.append(f"base.{column} = {value}")
    [242](file:///C:/Users/XXXX/OneDrive/Documents/GitHub/XXXX/.venv/Lib/site-packages/langchain_google_community/bq_storage_vectorstores/bigquery.py:242)     else:

TypeError: 'NoneType' object is not subscriptable

It is like using add_text update the store variable with the BQ Schema. And if you don't add the embedding the schema is None

Same behavior when using the retriever:

retriever = store.as_retriever()
retriever.search_kwargs = {"k": 1, "filter": {"len": 6}}
relevant_documents = retriever.invoke(query)

Thanks for your help

UPDATE:

When loading a existing BQ vector store with already embedded documents in it the table_schema variable is None:
image

But add_text update the schema with the schema of the loaded document:
image

When it is getting instanciate the BigQueryVectorStore should get the schema of te current table ?

A workaround would be to get the schema and update it mannually:
store.table_schema = {'doc_id': 'STRING', 'content': 'STRING', 'embedding': 'FLOAT', 'len': 'INTEGER'}

Thanks again

Update:
PR: #429

@langcarl langcarl bot added the investigate label Aug 7, 2024
@Freezaa9 Freezaa9 changed the title BigQueryVectorStore error when retrieving docs with filters BigQueryVectorStore error when retrieving docs with filters (PR proposal) Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant