-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor results on financial dataset #245
Comments
Hi @adilmukhtar82, according to your results and outputs, definitely it looks to me that your model is not trained at all. So can you please answer the following questions:
|
Also I looked at another issue #241, I used BertQA (as you posted a snippet) for same dataset (18 QA pairs) it works fine. So I think I don't need to fine again on SQuAD. I think Retriever isn't returning the most relevant documents as I have played with parameters but didn't get relevant documents at all. |
18 QA pairs is an extremely small dataset to fine-tune the model from 0. In my snippet, I load a model that was fine-tuned on SQuAD, so yes you won't need to fine-tune it again. As I cannot see how your dataset is structured and what is the type of content (language, terms, etc..) I am not able to spot directly the problem with the Retriever. But I really think it can be improved... Also, this kind of Retriever is a pretty simple one, it ranks the documents based on tf-idf vectorization and cosine-similarity between the question and the documents. It's good for speed performance, but it might not be the most performant form of Retriever. Maybe it also needs more documents to vectorize them properly... I am thinking about implementing and trying other forms of Retriever and to include them in |
Ah. Alright. @andrelmfarias just last question, can you please guide me towards any example to fine tune on SQuAD using cdQA? |
If you load the model as I did in my snippet you don't need to do it, as the model is already trained on SQuAD. But if you are still interested in learning how to do it, you can take a look at our official https://github.com/cdqa-suite/cdQA/blob/master/examples/tutorial-train-reader-squad.ipynb |
Thanks @andrelmfarias . Really appreciated. I am closing the issue. |
Hey @adilmukhtar82, can you pls share the link of that youtube presentation? :) |
@BojanKovachki this is the presentation in which @andrelmfarias have explained. |
@GoSaasML, thank you! |
I looked at your youtube presentation and you guys have done pretty neat and good job.
But as I have trained model for my financial data set, results aren't good. Following is one of the examples:
query: What is your contact number?
answer: Visit your nearest Bank branch
title: Bank FAQ
Same goes for rest of the questions. Most of the answers are "Visit your nearest branch"
I have annotated the dataset as SQuAD and trained the model. But once the it's about to complete, this message comes up "Training beyond specified 't_total'. Learning rate multiplier set to 0.0. Please set 't_total' of WarmupLinearSchedule correctly."
I don't know if modelling is completed with proper learning rate or it is reset to zero and causing poor results. But parameters of model, once trained, are same as I provided during creation of an object.
I can share my colab notebook if you want.
Please share your thoughts.
The text was updated successfully, but these errors were encountered: