Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify Sentence.to_original_text() to take into account Sentence.start_position for whitespace calculation #3150

Merged
merged 1 commit into from
Mar 22, 2023

Conversation

mauryaland
Copy link
Contributor

Related to #3142. This solution, by taking the difference between the first token start index and the sentence start index, keep the initial whitespace for some special cases while not printing a lot of useless whitespaces because we store the index of the token from a bigger document.

For example for the following text which is split in two Sentence objects in my application:

text = "Amaury et Valentin mangent au 77 boulevard  Perreire.\n\n\n\n Paul  a modifié le tokenizer."

It gives before the fix:

[Sentence[9]: "Amaury et Valentin mangent au 77 boulevard  Perreire.",
 Sentence[6]: "                                                          Paul  a modifié le tokenizer."]

And after the fix:

[Sentence[9]: "Amaury et Valentin mangent au 77 boulevard  Perreire.",
 Sentence[6]: "Paul  a modifié le tokenizer."]

And a sentence that begins with a whitespace

sentence = Sentence(" ... and then?")

print(sentence)

still got it in the printout:

Sentence[4]: " ... and then?"

@mauryaland mauryaland changed the title take into account Sentence.start_position to calculate whitespace Modify Sentence.to_original_text() to take into account Sentence.start_position for whitespace calculation Mar 15, 2023
@alanakbik
Copy link
Collaborator

@mauryaland thanks for fixing this!

@alanakbik alanakbik merged commit 1807b5d into flairNLP:master Mar 22, 2023
@mauryaland mauryaland deleted the patch-2 branch March 22, 2023 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants