Poor results on financial dataset #245

adilmukhtar82 · 2019-09-03T19:15:22Z

I looked at your youtube presentation and you guys have done pretty neat and good job.

But as I have trained model for my financial data set, results aren't good. Following is one of the examples:

query: What is your contact number?
answer: Visit your nearest Bank branch
title: Bank FAQ

Same goes for rest of the questions. Most of the answers are "Visit your nearest branch"

I have annotated the dataset as SQuAD and trained the model. But once the it's about to complete, this message comes up "Training beyond specified 't_total'. Learning rate multiplier set to 0.0. Please set 't_total' of WarmupLinearSchedule correctly."

I don't know if modelling is completed with proper learning rate or it is reset to zero and causing poor results. But parameters of model, once trained, are same as I provided during creation of an object.

I can share my colab notebook if you want.

Please share your thoughts.

andrelmfarias · 2019-09-04T09:35:18Z

Hi @adilmukhtar82, according to your results and outputs, definitely it looks to me that your model is not trained at all. So can you please answer the following questions:

What's the size of your dataset, i.e., how many question-answer pairs do you have?
Did you fine-tune it on SQuAD 1.1, before fine-tune it on your custom dataset? If your dataset is small, you might need to do it.
Could you please share your colab notebook? I cannot assure I will have enough time to audit it in depth but it can help me to spot the problem.

adilmukhtar82 · 2019-09-04T09:51:27Z

Size of the data set is relatively small. around 18 QA pairs.
I didn't fine tune it one SQuAD, I just used cdQA.
Sure, no problem. This is the notebook.

Also I looked at another issue #241, I used BertQA (as you posted a snippet) for same dataset (18 QA pairs) it works fine. So I think I don't need to fine again on SQuAD. I think Retriever isn't returning the most relevant documents as I have played with parameters but didn't get relevant documents at all.

andrelmfarias · 2019-09-04T10:09:46Z

18 QA pairs is an extremely small dataset to fine-tune the model from 0.

In my snippet, I load a model that was fine-tuned on SQuAD, so yes you won't need to fine-tune it again.

As I cannot see how your dataset is structured and what is the type of content (language, terms, etc..) I am not able to spot directly the problem with the Retriever. But I really think it can be improved... Also, this kind of Retriever is a pretty simple one, it ranks the documents based on tf-idf vectorization and cosine-similarity between the question and the documents. It's good for speed performance, but it might not be the most performant form of Retriever. Maybe it also needs more documents to vectorize them properly...

I am thinking about implementing and trying other forms of Retriever and to include them in cdQA if they are better than our current one. But it will definitely take a bit of time...

adilmukhtar82 · 2019-09-04T10:24:24Z

Ah. Alright.

@andrelmfarias just last question, can you please guide me towards any example to fine tune on SQuAD using cdQA?

andrelmfarias · 2019-09-04T10:27:11Z

@andrelmfarias just last question, can you please guide me towards any example to fine tune on SQuAD using cdQA?

If you load the model as I did in my snippet you don't need to do it, as the model is already trained on SQuAD. But if you are still interested in learning how to do it, you can take a look at our official
example for fine-tuning:

https://github.com/cdqa-suite/cdQA/blob/master/examples/tutorial-train-reader-squad.ipynb

adilmukhtar82 · 2019-09-04T10:31:45Z

Thanks @andrelmfarias . Really appreciated. I am closing the issue.

BojanKovachki · 2019-09-13T09:01:05Z

Hey @adilmukhtar82, can you pls share the link of that youtube presentation? :)

GoSaasML · 2019-09-13T10:02:11Z

@BojanKovachki this is the presentation in which @andrelmfarias have explained.

BojanKovachki · 2019-09-13T10:18:04Z

@GoSaasML, thank you!

adilmukhtar82 closed this as completed Sep 4, 2019

adilmukhtar82 reopened this Sep 4, 2019

adilmukhtar82 closed this as completed Sep 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor results on financial dataset #245

Poor results on financial dataset #245

adilmukhtar82 commented Sep 3, 2019 •

edited

Loading

andrelmfarias commented Sep 4, 2019

adilmukhtar82 commented Sep 4, 2019 •

edited by andrelmfarias

Loading

andrelmfarias commented Sep 4, 2019 •

edited

Loading

adilmukhtar82 commented Sep 4, 2019

andrelmfarias commented Sep 4, 2019

adilmukhtar82 commented Sep 4, 2019

BojanKovachki commented Sep 13, 2019

GoSaasML commented Sep 13, 2019

BojanKovachki commented Sep 13, 2019

Poor results on financial dataset #245

Poor results on financial dataset #245

Comments

adilmukhtar82 commented Sep 3, 2019 • edited Loading

andrelmfarias commented Sep 4, 2019

adilmukhtar82 commented Sep 4, 2019 • edited by andrelmfarias Loading

andrelmfarias commented Sep 4, 2019 • edited Loading

adilmukhtar82 commented Sep 4, 2019

andrelmfarias commented Sep 4, 2019

adilmukhtar82 commented Sep 4, 2019

BojanKovachki commented Sep 13, 2019

GoSaasML commented Sep 13, 2019

BojanKovachki commented Sep 13, 2019

adilmukhtar82 commented Sep 3, 2019 •

edited

Loading

adilmukhtar82 commented Sep 4, 2019 •

edited by andrelmfarias

Loading

andrelmfarias commented Sep 4, 2019 •

edited

Loading