Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

count_phonemized in words_mismatch should use given separator, not spaces to split words #169

Open
iamanigeeit opened this issue Jun 26, 2024 · 2 comments

Comments

@iamanigeeit
Copy link

iamanigeeit commented Jun 26, 2024

It's quite common to use spaces to separate the phonemes for speech synthesis.

But this leads to word mismatch problems because count_phonemized splits on whitespace.

>>> from phonemizer.backend import BACKENDS
>>> from phonemizer.separator import Separator
>>> G2P = BACKENDS['espeak'](language='en-us', words_mismatch='warn')
>>> SEP = Separator(word='|', phone=' ')
>>> G2P.phonemize(['try'], separator=SEP)[0]
WARNING:phonemizer:words count mismatch on line 1 (expected 1 words but get 4)
WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1)
't ɹ aɪ |'

It seems to be a common issue, e.g. #154 and lifeiteng/vall-e#50

I have fixed this (per below) but let me know if you need a PR for it.

Fix in words_mismatch.py

    @classmethod
    def _count_words(cls, text, wordsep=None):
        """Return the number of words contained in each line of `text`"""
        return [
            len([w for w in line.strip().split(wordsep) if w])
            for line in text]

    def count_phonemized(self, text, wordsep=None):
        """Stores the number of words in each output line"""
        self._count_phn = self._count_words(text, wordsep)

Fix in espeak.py:

    def _phonemize_postprocess(self, phonemized, punctuation_marks, separator):
        text = phonemized[0]
        switches = phonemized[1]

        self._words_mismatch.count_phonemized(text, separator.word)
        self._lang_switch.warning(switches)

        phonemized = super()._phonemize_postprocess(text, punctuation_marks, separator)
        return self._words_mismatch.process(phonemized)

Fix in base.py

    def phonemize(self, text, separator=default_separator,
                  strip=False, njobs=1):
        ...
        return self._phonemize_postprocess(phonemized, punctuation_marks, separator)

    def _phonemize_postprocess(self, phonemized, punctuation_marks, separator):
        ...

Note: this still raises warnings when unexpected line splits occur, such as caps in the middle GameStop or nonword chars before punctuation he said--, no. But it should suffice for most cases and the input text should be normalized properly.

mmmaat pushed a commit that referenced this issue Jun 28, 2024
@mmmaat
Copy link
Collaborator

mmmaat commented Jun 28, 2024

Thank's for pointing that bug! Does the fix in the issue_169 branch solve your problem?

@iamanigeeit
Copy link
Author

Thanks for adding test cases! I haven't tested as i simply changed my own code.

mmmaat added a commit that referenced this issue Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants