Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add: remove_trailing_repeat_consonants() #862

Merged
merged 39 commits into from
Nov 13, 2023
Merged

Conversation

konbraphat51
Copy link
Contributor

@konbraphat51 konbraphat51 commented Nov 9, 2023

What does this changes

Add removement method of repeating consonants by dictionary-based method.

How this fixes it

Add pythainlp.utils.remove_repeat_consonant(text, dictionary)

Find words in the dictionary that has repeating consonant at the last, and find the match with the end of the sentence. If there is none, make the repetition to one; if there is, make the repetition to that one.

Fixes #860

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

  • Passed code styles and structures
  • Passed code linting checks and unit test

konbraphat51 and others added 5 commits November 9, 2023 23:55
function to remove consonants
เริ่ดดดดดดดด -> เริ่ด

implementation + test code written.
Test passed
@pep8speaks
Copy link

pep8speaks commented Nov 9, 2023

Hello @konbraphat51! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 51:69: W291 trailing whitespace
Line 126:9: W503 line break before binary operator
Line 128:9: W503 line break before binary operator
Line 249:13: W503 line break before binary operator
Line 249:50: E203 whitespace before ':'
Line 251:13: W503 line break before binary operator

Comment last updated at 2023-11-13 07:50:03 UTC

@konbraphat51
Copy link
Contributor Author

Hmm... I cannot find more PEP8 error,,,
I will try more next morning. Good night!

@coveralls
Copy link

coveralls commented Nov 9, 2023

Coverage Status

coverage: 86.937% (+86.9%) from 0.0%
when pulling 3315cb0 on konbraphat51:dev
into edb52b3 on PyThaiNLP:dev.

@konbraphat51
Copy link
Contributor Author

Lint issue of normalize.py solved

pointed out by codeclimate
Cognitive complexity pointed out by CodeClimate

Black used
pointed out by Lint
black used
vscode autopep8 and black has been conflicting. So autopep8 cutted
cognitive complexity pointed out by CodeClimate.
Black used.
TODO resolved, black used, test passed
Code complexity pointed out by CodeClimate, black used
@konbraphat51
Copy link
Contributor Author

konbraphat51 commented Nov 10, 2023

Okey, Lint problem and CodeClimate problem are basically solved, and the test for this passed.
Review please!

It seems that the code length of a single file is set to 250 lines, but should I separate this method from normalize.py?

@bact bact added the enhancement enhance functionalities label Nov 10, 2023
@konbraphat51
Copy link
Contributor Author

konbraphat51 commented Nov 11, 2023

  • Files seperated
  • remove_trailing_repeat_consonants() sounds good to me, so I changed the name to that.
  • I think I clarified names in the script, but if there are still name inconsistencies, please specify them.
  • I revert normalize.py to what it was, so the Lint is pointing it out now
  • Not sure why codeacy saying "pythainlp.util.remove_trailing_repeat_consonants.remove_trailing_repeat_consonants' imported but unused." It is used in test code. Any idea?

pythainlp/util/normalize.py Outdated Show resolved Hide resolved
@bact
Copy link
Member

bact commented Nov 12, 2023

I think I clarified names in the script, but if there are still name inconsistencies, please specify them.

Sorry. I was meant to say that the existing remove_repeat_vowels() and remove_dup_spaces() have naming inconsistency issue. Not from the new function you're now contributing.

@bact
Copy link
Member

bact commented Nov 12, 2023

Neat contribution, thank you. I think that's all. I can approve after few suggested fixes above.

@konbraphat51
Copy link
Contributor Author

Ok, all done.
Review please :)

Copy link

sonarcloud bot commented Nov 13, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@bact
Copy link
Member

bact commented Nov 13, 2023

I have a last minute thought about which name we should go with,
between:

  • remove_trailing_repeat_xxx vs
  • remove_repeat_trailing_xxx

(try to think what it will like in the context of code completion, when ppl type remove_ .. which char they)

But I will merge this first.
We can later thinking more about consistent names (together with that remove_dup_xxx) and change this again when appropriate.

Thank you for your contribution.

(the test on Windows is now running for 58 minutes and counting, this shouldn't be the norm :( )

@bact bact merged commit c981042 into PyThaiNLP:dev Nov 13, 2023
11 of 14 checks passed
PyThaiNLP automation moved this from In progress to Done Nov 13, 2023
@bact bact changed the title Add: remove_repeat_consonants() Add: remove_trailing_repeat_consonants() Nov 13, 2023
@bact bact mentioned this pull request Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhance functionalities
Projects
PyThaiNLP
  
Done
Development

Successfully merging this pull request may close these issues.

[Suggestion] Add consonant-remover method
4 participants