Question about experiment #5

xuchaoUCAS · 2020-03-05T03:18:50Z

I don't understanding the meaning of this experiment.
Too many errors in the gold set.
For examples, in the europarl-v7.de-en.en.sentences.test.gold:
line 73:I am happy to try and answer, Mr Wijsenbeek. As you will certainly know,……. Here "I am happy to try and answer, Mr Wijsenbeek." is obviously a single sentence and the gold dost't mark is as.
Simliar data:
line 130,175... too much
So I don't understanding the meaning of "sentence boundary detection" in this dataset.

stefan-it · 2020-03-11T12:11:02Z

Hi @xuchaoUCAS,

we use this kind of dataset, because there are no 100% gold-labeled datasets available for this task. That's why we refer to it as "quasi-segmented" datasets.

However, in preliminary experiments we used Universal Dependencies (normally used for e.g. PoS tagging). These datasets contain a more sentence-segmented structure. But: the number of sentences is less than e.g. the Europarl corpora!

morpheus-87 assigned stefan-it May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about experiment #5

Question about experiment #5

xuchaoUCAS commented Mar 5, 2020

stefan-it commented Mar 11, 2020 •

edited

Loading

Question about experiment #5

Question about experiment #5

Comments

xuchaoUCAS commented Mar 5, 2020

stefan-it commented Mar 11, 2020 • edited Loading

stefan-it commented Mar 11, 2020 •

edited

Loading