Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inferencing #18

Open
simrankaurjolly16 opened this issue Aug 25, 2021 · 1 comment
Open

Inferencing #18

simrankaurjolly16 opened this issue Aug 25, 2021 · 1 comment

Comments

@simrankaurjolly16
Copy link

How to do inferencing after running the training on unseen data?

@ajaybabu20
Copy link

I believe depending on your task, you first need to pass the query table and target table to the blocking function.
While doing so, you so will reduce the number of candidates for each query and target partition. Then for each partition you need to create the data in a way that each query is mapped to all the targets.

So if you have 10 queries and 10 targets in a single partition, you will be creating a dataset of size 100.
After which you apply pre-processing, i.e applying special tokens, etc.. and perform inference. Then you would get the scores for each of the pairs and select the one with the highest pair

Ideally, we would like to extract the final layer from the Ditto model and index the target embeddings using ANN like faiss. Then for the query, you just have to get the embedding and do a fast lookup. This is much more scalable.
The original author of BERT made a comment saying that BERT is not pretrained for semantic similarity google-research/bert#164 (comment). You might get poor results, even worse than simple Glove Embeddings

I am doing a small experiment with Ditto model on both of the approaches and will update this space when I get some results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants