Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 1.25 KB

File metadata and controls

25 lines (18 loc) · 1.25 KB

UQuAD---Urdu-Question-Answer-Dataset

We have presented a new dataset for question and answering models. Our dataset contains 27 different Urdu paragraphs which are taken from different available resources i.e Urdu Wikipedia, youtube and news articles etc. All selected paragraphs have an average of 3 to 7 questions along with their possible answers that range from 1 to 3. The data contains mostly Urdu words as well as some words from English language.

Type Count
Total # of Paragraph 27
Total # of Question 499
Total # of worlds in all paragraphs 5553
Total # of unique worlds in all paragraphs 1631
Total # of worlds in all Question 1237
Total # of unique worlds in all Question 395

Structure of dataset: Assuming variable data has the data imported into it as a dictionary:

To access first question-context set: data['1'] To access all questions of first datapoint: data['1']['question'] To access context of first datapoint: data['1']['context']

Any questions or comments or if you are willing to contribute?

Let us know at ahsan.farooqui@ieee.org