Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

empty lines #1

Open
adirc opened this issue Apr 17, 2016 · 2 comments
Open

empty lines #1

adirc opened this issue Apr 17, 2016 · 2 comments

Comments

@adirc
Copy link

adirc commented Apr 17, 2016

There is a lot of empty lines in the gs files - /STS2015-gold/STS.gs.headlines.txt for example.
is it means something? or just the label is missing?

@alvations
Copy link
Owner

I had the same question too when I tried to play around the dataset. The empty lines just means that they are not annotated by the STS organizers. Not all sentence pairs are used in the evaluation.

Daniel Cer explains on https://groups.google.com/d/msg/sts-semeval/js-Y0e92YuM/jJUi5beJBwAJ

@alvations
Copy link
Owner

alvations commented Apr 17, 2016

To slurp the STS data into a sframe dataframe, I usually do this:

import sframe
# Reads STS2012-2015 dataset.
sts_train = sframe.SFrame.read_csv('sts.csv', delimiter='\t', column_type_hints=[str, str, float, str, str], quote_char='\0')
# Throw the sentence pairs with empty annotations.
sts_train = sts_train.dropna(columns=['Score'])

Take a look at https://github.com/alvations/stasis/blob/master/notebooks/SWORD.ipynb and https://github.com/alvations/stasis/blob/master/notebooks/SHIELD.ipynb for more details =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants