Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Chroma.add terminates flask without any error #2438

Open
petergaoshan opened this issue Jul 2, 2024 · 2 comments
Open

[Bug]: Chroma.add terminates flask without any error #2438

petergaoshan opened this issue Jul 2, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@petergaoshan
Copy link

What happened?

I'm creating an API with Flask. The other side will send me a file and I will save it to chroma database on my side. Chroma.add will terminates my program without any exception. When I save a smaller file to it, it will be fine, when send a larger file it will crash. Firstly, I thought it might be memory problem, and I tested the same code in jupyter notebook outside flask. When I run the same code in jupyter notebook, it will run properly.

def save_w_chunking(self, docs: List[Document]) -> None:

        text_splitter = SemanticChunker(self._embeddings, breakpoint_threshold_type = "percentile", breakpoint_threshold_amount = 80, sentence_split_regex = r'(?<=[。?!])|(?<=\n)')

        docs = text_splitter.split_documents(docs)

        seen_docs = []
        temp_docs = []
            
        for d in docs:

            is_unique = d.page_content not in seen_docs
            has_content = len(d.page_content.strip().strip("\n")) > 0

            if is_unique and has_content:

                seen_docs.append(d.page_content)

                d.page_content =  d.metadata["filename"] + ":\n" + d.page_content

                temp_docs.append(d)

        docs = temp_docs
        docs = filter_complex_metadata(docs)

        if len(docs) == 0:
            return
        
        try:

            t = [d.page_content for d in docs]
            m = [d.metadata for d in docs]
            ids = [str(uuid.uuid4()) for _ in range(len(t))]

            self._ChromaDB.add(ids = ids,
                               documents = t,
                               metadatas = m)

        except Exception as e:
            print("caught exception: ", e)
@app.route('/ChromaEditor', methods = ['POST'])
def upload_file():

    result = {"msg" : "success"}

    file = request.files["file"]

    file_path = os.path.join("blink", file.filename)

    file.save(file_path)

    text_doc, table_doc = unstrucutured_to_Doc([file_path])

    print("successfully parsed " + file.filename)

    CE.save_w_chunking(text_doc)
    CE.save_wo_chunking(table_doc)

    print("successfully saved " + file.filename)

    return result

Versions

python 3.12.3
chromadb 0.5.0
langchain-chroma 0.1.1

Relevant log output

* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
Press CTRL+C to quit
successfully parsed hello.docx

(llm) C:\Users\Desktop>
@petergaoshan petergaoshan added the bug Something isn't working label Jul 2, 2024
@tazarov
Copy link
Contributor

tazarov commented Jul 2, 2024

@petergaoshan, do you run your flask app in a container? It might get terminated if you run out of memory.

@tazarov
Copy link
Contributor

tazarov commented Aug 6, 2024

@petergaoshan, possible cause for this might be chroma-hnswlib raising a segmentation fault, which sometimes is masked. #2513

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants