Modify Sentence.to_original_text() to take into account Sentence.start_position for whitespace calculation #3150

mauryaland · 2023-03-15T22:07:41Z

Related to #3142. This solution, by taking the difference between the first token start index and the sentence start index, keep the initial whitespace for some special cases while not printing a lot of useless whitespaces because we store the index of the token from a bigger document.

For example for the following text which is split in two Sentence objects in my application:

text = "Amaury et Valentin mangent au 77 boulevard  Perreire.\n\n\n\n Paul  a modifié le tokenizer."

It gives before the fix:

[Sentence[9]: "Amaury et Valentin mangent au 77 boulevard  Perreire.",
 Sentence[6]: "                                                          Paul  a modifié le tokenizer."]

And after the fix:

[Sentence[9]: "Amaury et Valentin mangent au 77 boulevard  Perreire.",
 Sentence[6]: "Paul  a modifié le tokenizer."]

And a sentence that begins with a whitespace

sentence = Sentence(" ... and then?")

print(sentence)

still got it in the printout:

Sentence[4]: " ... and then?"

alanakbik · 2023-03-22T14:11:46Z

@mauryaland thanks for fixing this!

take into account Sentence.start_position to calculate whitespace

7fe1666

mauryaland changed the title ~~take into account Sentence.start_position to calculate whitespace~~ Modify Sentence.to_original_text() to take into account Sentence.start_position for whitespace calculation Mar 15, 2023

alanakbik merged commit 1807b5d into flairNLP:master Mar 22, 2023

mauryaland deleted the patch-2 branch March 22, 2023 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify Sentence.to_original_text() to take into account Sentence.start_position for whitespace calculation #3150

Modify Sentence.to_original_text() to take into account Sentence.start_position for whitespace calculation #3150

mauryaland commented Mar 15, 2023

alanakbik commented Mar 22, 2023

Modify Sentence.to_original_text() to take into account Sentence.start_position for whitespace calculation #3150

Modify Sentence.to_original_text() to take into account Sentence.start_position for whitespace calculation #3150

Conversation

mauryaland commented Mar 15, 2023

alanakbik commented Mar 22, 2023