Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edits missed for a substitute -> Delete -> Substitute sequence. #28

Closed
codedecde opened this issue Jul 2, 2021 · 3 comments
Closed

Comments

@codedecde
Copy link

Hi.
I am running into the following error:
For the source, target pairs:

source: In the article mrom The the New York Times.
target: In the article from The New York Times.

The edit mrom -> from is missed by ERRANT. The output from ERRANT was:

["Orig: [4, 6, 'The the'], Cor: [4, 5, 'The'], Type: 'U:DET'"]

On digging a little, it seems to be the issue with all alignment types of the following form

Input: w1 w2 w3
Output: w4 w5
such that w3.lower() == w5.lower()

Alignment Sequence: S w1 -> w4, D w2 -> "", S w3 -> w5

Then the edit "w1" -> "w4" is missed, and "w2 w3" -> "w5" is generated by errant.en.merger.process_seq
Example:

source: "In thir the"
target: "On The"
Errant Output: ["Orig: [1, 3, 'Thir the'], Cor: [1, 2, 'The'], Type: 'U:NOUN'"]
# Missing In -> On
@chrisjbryant
Copy link
Owner

chrisjbryant commented Jul 2, 2021

Hey, good catch!

<Previous solution>

Edit: The solution I posted previously works, but might have some unintended side effects, so I just fixed the brackets as in #26, which also works.

@codedecde
Copy link
Author

Awesome ! Thank you : )

@chrisjbryant
Copy link
Owner

Turns out this was actually the same issue as in #26 and I just needed to fix some brackets. I've edited my answer above accordingly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants