-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HnswDensevector SafeTensor Generator #2515
base: master
Are you sure you want to change the base?
Conversation
Setup for NFCorpus Indexing with SafetensorsTo efficiently perform NFCorpus indexing using Safetensors, follow this setup workflow:
Indexing ProcedureTo build HNSWSafetensors indexes, use the following sample command:
Ensure all paths and parameters are adjusted according to your setup and directory structure. |
Can you make the safetensors collection go into We also shouldn't need a new indexer. The indexing command should be similar to https://github.com/castorini/anserini/blob/master/docs/regressions/regressions-beir-v1.0.0-nfcorpus-bge-base-en-v1.5-hnsw.md e.g.,
With the only exception being a different |
Updated Workflow for Safetensors Conversion and Indexing Process
|
...in/java/io/anserini/index/generator/HnswJsonWithSafeTensorsDenseVectorDocumentGenerator.java
Outdated
Show resolved
Hide resolved
...in/java/io/anserini/index/generator/HnswJsonWithSafeTensorsDenseVectorDocumentGenerator.java
Outdated
Show resolved
Hide resolved
...in/java/io/anserini/index/generator/HnswJsonWithSafeTensorsDenseVectorDocumentGenerator.java
Outdated
Show resolved
Hide resolved
Updates
Updated commandsPythonpython src/main/python/safetensors/json_to_bin.py --input collections/beir-v1.0.0/bge-base-en-v1.5/nfcorpus/vectors.part00.jsonl --output collections/beir-v1.0.0/bge-base-en-v1.5.safetensors/nfcorpus Javabin/run.sh io.anserini.index.IndexHnswDenseVectors -collection JsonDenseVectorCollection -input collections/beir-v1.0.0/bge-base-en-v1.5/nfcorpus -generator HnswJsonWithSafeTensorsDenseVectorDocumentGenerator -index indexes/beir-v1.0.0/bge-base-en-v1.5/nfcorpus/ -threads 16 -M 16 -efC 100 -memoryBuffer 65536 -noMerge >& logs/log.beir-v1.0.0-nq.bge-base-en-v1.112 & |
62cd3c7
to
ff75047
Compare
Looking at this command:
What are these three options doing?
And why are these the same?
I would expect |
I think you are looking at the older command this is the updated one
|
Ah, please update to keep up to date? |
My apologies it got lost within all the commits : ) Python
Java
|
src/main/java/io/anserini/index/generator/DenseVectorDocumentGenerator.java
Outdated
Show resolved
Hide resolved
Sorry, I'm confused again:
Why would So we'd have something like SafeTensorsDenseVectorCollection that reasons from SafeTenors? |
I'm not getting your logic, but I think you need to implement two classes:
And your command would be something like And you'd "wire everything together". |
Add to onboarding reproduction logs (castorini#2546)
updated command :
|
Linked issue : castorini/ura-projects#31 (comment)
@17Melissa will provide the flow command below :)