Skip to content

Acceptability judgment task dataset based on the sentences written by non-native English speakers

License

Notifications You must be signed in to change notification settings

yualeks63/NNS-500

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DOI

NNS-500

Acceptability judgment task dataset based on the sentences written by non-native English speakers

Acceptability judgment task (AJT)

AJT is a common method in empirical linguistics to gather information about the internal grammar of speakers of a language, which is considered a promising area to evaluate neural language models’ linguistic knowledge. There is a Corpus of Linguistic Acceptability (CoLA) whose creators think Boolean judgements sufficient; similarly, some non-English resources cast acceptability as a binary classification task.

Dataset

NNS-500 dataset based on the sentences written by non-native speakers (which is important from the point of view of the source of unacceptable sentences) and labelled by a university English teacher is intended for testing the pre-trained neural networks. It has 350 acceptable and 150 unacceptable sentences, which is 70% of acceptability (this compares to 69.2% in the CoLA out-of-domain set).
Dataset: https://github.com/yualeks63/NNS-500/blob/main/NNS-500_dataset.csv
More information: https://github.com/yualeks63/NNS-500/blob/main/NNS-500_dataset_description.pdf