[Bug]: kernel die when training vanna ai (with chromadb) with more than 90 vectors. #2405

mkhansa · 2024-06-23T08:33:34Z

What happened?

when adding training data (350 vector) to vanna ai model, it is consuming 100%+ of the cpu (12th Gen Intel(R) i7 - 12650H) and 32 GB RAM and the kernel will "die".
i need to decrease the number of vectors to 70-80 only.

python code:

class MyVanna(ChromaDB_VectorStore, GoogleGeminiChat):
def init(self, config=None):
ChromaDB_VectorStore.init(self, config={
"path": "../path/VannaAI_path"
})
GoogleGeminiChat.init(self, config={'api_key': 'XXXXXX, "temperature":0, 'model': "gemini-1.5-pro"})

vn = MyVanna()
.....

with open('../training_data/doc_training_data.json', 'r') as f:
documentation_list = json.load(f)["documentation"]

for rule in documentation_list:
print(rule)
vn.train(documentation=rule) (here, the notebook crashed)

Versions

chromadb==0.5.3 , Python 3.12.2, windows

Relevant log output

No response

tazarov · 2024-06-24T07:45:03Z

@mkhansa, I'm not familiar with Vanna AI and the problem they are solving. At a glance, it seems it is a RAG application aimed at answering SQL-related questions. Their train workflow seems to be using an LLM to create embeddings from docs, schemas, DDLs etc. Their use of Chroma is also quite straightforward. Without a deeper understanding of what their training workflow does beyond adding embeddings for the documentation in Chroma, I cannot say what could be causing this issue.

To test further, can I ask you to run Chroma in a separate instance e.g. docker or CLI, and then create an HttpClient and pass that as configuration in the Vanna vector store:
https://github.com/vanna-ai/vanna/blob/8cc20fbd22d73dd0321cc7464860c0f15080f3ad/src/vanna/chromadb/chromadb_vector.py#L23

Then run your workbook as above and check your processes to see which one consumes the 100% CPU.

mkhansa added the bug Something isn't working label Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: kernel die when training vanna ai (with chromadb) with more than 90 vectors. #2405

[Bug]: kernel die when training vanna ai (with chromadb) with more than 90 vectors. #2405

mkhansa commented Jun 23, 2024 •

edited

Loading

tazarov commented Jun 24, 2024

[Bug]: kernel die when training vanna ai (with chromadb) with more than 90 vectors. #2405

[Bug]: kernel die when training vanna ai (with chromadb) with more than 90 vectors. #2405

Comments

mkhansa commented Jun 23, 2024 • edited Loading

What happened?

Versions

Relevant log output

tazarov commented Jun 24, 2024

mkhansa commented Jun 23, 2024 •

edited

Loading