Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quote attribution annotation #190

Open
andreasvc opened this issue Mar 29, 2019 · 4 comments
Open

Quote attribution annotation #190

andreasvc opened this issue Mar 29, 2019 · 4 comments
Labels
idea Ideas for future features

Comments

@andreasvc
Copy link
Contributor

What do you think about adding quote attribution annotation as a feature?
This would involve creating a special/separate type of "mention" for quotes, which has a slot for its speaker mention (and perhaps addressee as well).
It might make sense to do such annotation together with coreference annotation because then you already have the mentions and entities of possible speakers, and it makes sense to annotate both because the tasks can benefit from training data of the other.

For reference, see http://aclweb.org/anthology/E17-1044
which also comes with an annotation tool.

@nilsreiter
Copy link
Owner

Not entirely sure. I see why (and that) it makes sense, but I will not -- in the long term -- be able to maintain an annotation tool that is too generic. Quote + speaker + addressee is already very close to general slot filling. Need to think about it.

I plan to use CorefAnnotator for a new project soonish, which will also be a different task from coreference and require some adaptations. Maybe there is a way to make this possible, but it will require substantial development effort.

@nilsreiter nilsreiter added the idea Ideas for future features label Mar 30, 2019
@andreasvc
Copy link
Contributor Author

Fair enough. It's indeed a nontrivial feature, e.g. in terms of UI and file format.
As plan B I can probably find a way to pre- and postprocess files annotated with coreference and use them in the annotation tool of the paper I mentioned.

@nilsreiter
Copy link
Owner

If you need a new file format for this tool combination, I'd be happy to implement an exporter.

One more though: I plan to integrate an editor for entity relations and properties (e.g., X is female [property] and mother of Y [relation]). Technically, this could be extended to also allow relations between mentions. And if this is done, relating an entity mention with a speech mention could be done. But we would need a new UI for this ...

@andreasvc
Copy link
Contributor Author

The preprocessing I have in mind involves:

  • taking a list of potential speakers from manually corrected coreference annotations (perhaps filtering out non-human entities which cannot be speakers)
  • combining that with automatically detected direct speech spans
  • finally, producing a file in the XML format used by the quote attribution annotation tool of Muzny et al 2017: https://github.com/muzny/quoteannotator
  • In addition I should probably pre-populate the file with heuristically-detected quote attributions to save work on the easy cases.

Adding this XML format as a new export format is possible, but neither the information to identify possible speakers nor the quotation spans are available within CorefAnnotator, so I guess it makes more sense to do the conversion with an external tool.

The idea to do quote attribution annotation using mention relations could work, but I think it would be messy to treat quotations as regular mentions/entities. It would be better to have quotations as a separate kind of spans with its own tag in the XMI file; e.g. <quotation start=1 end=10 speaker=12 />. But this would require more UI changes of course..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Ideas for future features
Projects
None yet
Development

No branches or pull requests

2 participants