BigQueryVectorStore error when retrieving docs with filters (PR proposal) #426

Freezaa9 · 2024-08-07T16:19:47Z

Hi,
When I follow this I can retrieve my documents with filters.
https://python.langchain.com/v0.2/docs/integrations/vectorstores/google_vertex_ai_vector_search/

But if I instanciate my existing BigQueryVectorStore without adding text in the same environment I cannot retrieve document with filters.
Example:
Create a BigQueryVectorStore and use add_text to add documents.
In a saparete notebook use:

from langchain_google_vertexai import VertexAIEmbeddings

embedding = VertexAIEmbeddings(
    model_name="textembedding-gecko@latest", project=PROJECT_ID
)
from langchain_google_community import BigQueryVectorStore

store = BigQueryVectorStore(
    project_id=PROJECT_ID,
    dataset_name=DATASET,
    table_name=TABLE,
    location=REGION,
    embedding=embedding,
)
docs = store.similarity_search_by_vector(query_vector, filter={"len": 6})
print(docs)

I get the error:

File c:\Users\geoff\OneDrive\Documents\GitHub\pcd-data-eu-genai-diorastra-orch\.venv\Lib\site-packages\langchain_google_community\bq_storage_vectorstores\_base.py:387, in BaseBigQueryVectorStore.similarity_search_by_vectors(self, embeddings, filter, k, with_scores, with_embeddings, **kwargs)
...
--> [240](file:///C:/Users/XXXX/OneDrive/Documents/GitHub/XXXX/.venv/Lib/site-packages/langchain_google_community/bq_storage_vectorstores/bigquery.py:240)     if self.table_schema[column] in ["INTEGER", "FLOAT"]:  # type: ignore[index]
    [241](file:///C:/Users/XXXX/OneDrive/Documents/GitHub/XXXX/.venv/Lib/site-packages/langchain_google_community/bq_storage_vectorstores/bigquery.py:241)         filter_expressions.append(f"base.{column} = {value}")
    [242](file:///C:/Users/XXXX/OneDrive/Documents/GitHub/XXXX/.venv/Lib/site-packages/langchain_google_community/bq_storage_vectorstores/bigquery.py:242)     else:

TypeError: 'NoneType' object is not subscriptable

It is like using add_text update the store variable with the BQ Schema. And if you don't add the embedding the schema is None

Same behavior when using the retriever:

retriever = store.as_retriever()
retriever.search_kwargs = {"k": 1, "filter": {"len": 6}}
relevant_documents = retriever.invoke(query)

Thanks for your help

UPDATE:

When loading a existing BQ vector store with already embedded documents in it the table_schema variable is None:

But add_text update the schema with the schema of the loaded document:

When it is getting instanciate the BigQueryVectorStore should get the schema of te current table ?

A workaround would be to get the schema and update it mannually:
store.table_schema = {'doc_id': 'STRING', 'content': 'STRING', 'embedding': 'FLOAT', 'len': 'INTEGER'}

Thanks again

Update:
PR: #429

The text was updated successfully, but these errors were encountered:

langcarl bot added the investigate label Aug 7, 2024

Freezaa9 mentioned this issue Aug 7, 2024

fix: update table_schema when instantiating already existing BigQueryVectorStore #429

Merged

Freezaa9 changed the title ~~BigQueryVectorStore error when retrieving docs with filters~~ BigQueryVectorStore error when retrieving docs with filters (PR proposal) Aug 7, 2024

lkuligin closed this as completed in #429 Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQueryVectorStore error when retrieving docs with filters (PR proposal) #426

BigQueryVectorStore error when retrieving docs with filters (PR proposal) #426

Freezaa9 commented Aug 7, 2024 •

edited

Loading

BigQueryVectorStore error when retrieving docs with filters (PR proposal) #426

BigQueryVectorStore error when retrieving docs with filters (PR proposal) #426

Comments

Freezaa9 commented Aug 7, 2024 • edited Loading

Freezaa9 commented Aug 7, 2024 •

edited

Loading