Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor results on financial dataset #245

Closed
adilmukhtar82 opened this issue Sep 3, 2019 · 9 comments
Closed

Poor results on financial dataset #245

adilmukhtar82 opened this issue Sep 3, 2019 · 9 comments

Comments

@adilmukhtar82
Copy link

adilmukhtar82 commented Sep 3, 2019

I looked at your youtube presentation and you guys have done pretty neat and good job.

But as I have trained model for my financial data set, results aren't good. Following is one of the examples:

query: What is your contact number?
answer: Visit your nearest Bank branch
title: Bank FAQ

Same goes for rest of the questions. Most of the answers are "Visit your nearest branch"

I have annotated the dataset as SQuAD and trained the model. But once the it's about to complete, this message comes up "Training beyond specified 't_total'. Learning rate multiplier set to 0.0. Please set 't_total' of WarmupLinearSchedule correctly."

I don't know if modelling is completed with proper learning rate or it is reset to zero and causing poor results. But parameters of model, once trained, are same as I provided during creation of an object.

I can share my colab notebook if you want.

Please share your thoughts.

@andrelmfarias
Copy link
Collaborator

Hi @adilmukhtar82, according to your results and outputs, definitely it looks to me that your model is not trained at all. So can you please answer the following questions:

  1. What's the size of your dataset, i.e., how many question-answer pairs do you have?

  2. Did you fine-tune it on SQuAD 1.1, before fine-tune it on your custom dataset? If your dataset is small, you might need to do it.

  3. Could you please share your colab notebook? I cannot assure I will have enough time to audit it in depth but it can help me to spot the problem.

@adilmukhtar82
Copy link
Author

adilmukhtar82 commented Sep 4, 2019

  1. Size of the data set is relatively small. around 18 QA pairs.
  2. I didn't fine tune it one SQuAD, I just used cdQA.
  3. Sure, no problem. This is the notebook.

Also I looked at another issue #241, I used BertQA (as you posted a snippet) for same dataset (18 QA pairs) it works fine. So I think I don't need to fine again on SQuAD. I think Retriever isn't returning the most relevant documents as I have played with parameters but didn't get relevant documents at all.

@andrelmfarias
Copy link
Collaborator

andrelmfarias commented Sep 4, 2019

18 QA pairs is an extremely small dataset to fine-tune the model from 0.

In my snippet, I load a model that was fine-tuned on SQuAD, so yes you won't need to fine-tune it again.

As I cannot see how your dataset is structured and what is the type of content (language, terms, etc..) I am not able to spot directly the problem with the Retriever. But I really think it can be improved... Also, this kind of Retriever is a pretty simple one, it ranks the documents based on tf-idf vectorization and cosine-similarity between the question and the documents. It's good for speed performance, but it might not be the most performant form of Retriever. Maybe it also needs more documents to vectorize them properly...

I am thinking about implementing and trying other forms of Retriever and to include them in cdQA if they are better than our current one. But it will definitely take a bit of time...

@adilmukhtar82
Copy link
Author

Ah. Alright.

@andrelmfarias just last question, can you please guide me towards any example to fine tune on SQuAD using cdQA?

@andrelmfarias
Copy link
Collaborator

@andrelmfarias just last question, can you please guide me towards any example to fine tune on SQuAD using cdQA?

If you load the model as I did in my snippet you don't need to do it, as the model is already trained on SQuAD. But if you are still interested in learning how to do it, you can take a look at our official
example for fine-tuning:

https://github.com/cdqa-suite/cdQA/blob/master/examples/tutorial-train-reader-squad.ipynb

@adilmukhtar82
Copy link
Author

Thanks @andrelmfarias . Really appreciated. I am closing the issue.

@BojanKovachki
Copy link

Hey @adilmukhtar82, can you pls share the link of that youtube presentation? :)

@GoSaasML
Copy link

@BojanKovachki this is the presentation in which @andrelmfarias have explained.

@BojanKovachki
Copy link

@GoSaasML, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants