-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update MultiCoNer #3006
Update MultiCoNer #3006
Conversation
tokens.append(token) | ||
|
||
sentence: Sentence = Sentence(text=tokens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much cleaner than creating an empty sentence and adding tokens. I wonder if this means we can retire the add_token
method entirely in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the usage after this PR, we still have the following usages:
- Tests, which can be rewritten
UniversalDependenciesDataset
which works the same way as the ColumnDataset worked before- Sentence - constructor, which I would prefer to keep.
We can surely make the add_token
method private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a proposal to this PR
for comment_row in comment.split("\t"): | ||
if "=" in comment_row: | ||
key, value = comment_row.split("=", 2) | ||
sentence.add_metadata(key, value) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this, we really needed this feature in ColumnDataset :)
@helpmefindaname thanks for adding this and for the add_token modifications! |
This PR adds the following features:
# id 2c8f5b49-4df5-44b5-8c2a-bdc340010ea3 domain=de-lowner
will add a metadatadomain
with the valuede-lowner
. In general, this is about tab separated<key>=<value>
patterns.