Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conllu-prolog atoms for words aren't unique by sentence #27

Open
GPPassos opened this issue May 13, 2017 · 3 comments
Open

conllu-prolog atoms for words aren't unique by sentence #27

GPPassos opened this issue May 13, 2017 · 3 comments

Comments

@GPPassos
Copy link
Contributor

As it stands now, atoms for sentences are ctestset_scf790_2 (context testset, sentence cf790_2), while word atoms are like ctestset_i25 (context testset, word 25). For instance, we have:

?- nlp_sentence(S), nlp_dependency(S,Y,Z,W).
S = ctestset_scf790_2,
Y = ctestset_i25,
Z = ctestset_i9,
W = punct 

However, for analysing multiple sentences this isn't so great, as the 1st word from different sentences of same context are both called cCONTEXT_i1.

Perhaps we should add a sentence identifier on each word atom as well.

@arademaker
Copy link
Contributor

@GPPassos are talking about the export to prolog code?

@fcbr
Copy link
Contributor

fcbr commented May 14, 2017

can you give a concrete example where this fails? We never use a token ID outside the context of a single sentence, so I'm not sure why this would be a problem.

@fcbr
Copy link
Contributor

fcbr commented May 14, 2017

continuing, the reason that this is this way is that it makes much easier to debug rules when we have simple ids instead of long self-contained ones. In fact, I never use the context at all, since we can make all sentence ids to be unique over multiple files. This way we have simple ids for both sentences and tokens. I'm not opposed to changing it, this but I need a valid scenario where this would not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants