-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_ea_ and _eo_ CONJ+[DET,ADP] wrong contractions #5
Comments
Complement: 887 cases in the whole corpus |
This seems to be related to / duplicate of #1 and UniversalDependencies/docs#294 by @pedrobalage. (But that pull request has not been merged, so yes, we still need a fix.) |
Sorry guys, I didn't have time to follow up this issue. Maybe @ceramisch can help. |
I think that the fix cannot be done fully automatically (but I can try to take care of it after release 2.2):
Then comes my question: any recommendation of an annotation tool capable of dealing with UD2.0 directly, to make this task easier? |
For manual annotation, see http://universaldependencies.org/tools.html#third-party-tools (but not all the tools listed there support full UD 2.0 including enhanced deps). |
For automatic edits using simple declarative rules (no coding needed) I can also recommend DepEdit, which we use to convert UD_English-GUM: |
@ceramisch we are using our library cl-conllu and the Emacs mode we developed, both listed in the UD tools. I just checked that this corpus has many validation issues. Are you planning to solve these issues for this release 2.2 ? I can’t promise to help in the next 2 days. |
I think it will be better to leave this corpus out of the shared task. The participants will have enough on their plate even without it, and its inconsistency with Bosque worries me. However, it can still be in the full release 2.2 after the shared task. Then the deadline is June 15. |
No (as you probably realized). I would be happy to help solving this annoying "eo" issue but I can only work on this after the rush of the PARSEME shared task is over (i.e. mid-May) |
The issue was done by @ceramisch , I only fixed the commits to avoid unnecessary copies of files in the repository. The python code from @ceramisch marks all tokens that need manual revision with |
The numbers probably differ because I had manually removed the XXXXX from the cases I had already checked manually. I will try to finish the manual verification for the next release so that this issue can be definitely closed. Thanks for sorting things out with the duplications, @arademaker and sorry for the mess: I think now I understand better the idea of branches for UD treebank development :-) |
I will create a |
OK, no worries. |
@ceramisch branch workbench created and folder |
see also #9 |
this issue was solved, I didn't find cases of |
Strangely, the conjunction e (and) appears contracted with the next token o or a
This would be correct in Arabic but it is never done in Portuguese, e does not contract with any other word.
Correcting the problem for o can be automatic, but a requires manual intervention because it is ambiguous between a determiner (a, feminine definite article the -- most cases) and the preposition a (to - rarer).
The text was updated successfully, but these errors were encountered: