Skip to content

Latest commit

 

History

History
17 lines (15 loc) · 1.18 KB

README.md

File metadata and controls

17 lines (15 loc) · 1.18 KB

MergedQUAD Dataset

MergedQUAD consists of splits for SQUAD-based Question-Answering in Hindi language. It is a combination of examples taken from other multilingual SQUAD-based Question Answering datasets like XQUAD and TyDiQA. This dataset was introduced in our paper titled "Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages" which has been accepted as a workshop paper at ML-RSA (NeurIPS 2020). This paper presents an exhaustive study of transformer-based architectures on Indian languages like Hindi, Bengali and Telugu. You can find our models on HuggingFace model hub over here.

Citation

If you use this work, please cite

@misc{jain2020indictransformers,
      title={Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages}, 
      author={Kushal Jain and Adwait Deshpande and Kumar Shridhar and Felix Laumann and Ayushman Dash},
      year={2020},
      eprint={2011.02323},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}