Skip to content
This repository has been archived by the owner on Aug 25, 2023. It is now read-only.

Build TTS-Thai to Flite voices. #8

Open
wannaphong opened this issue Nov 1, 2019 · 10 comments
Open

Build TTS-Thai to Flite voices. #8

wannaphong opened this issue Nov 1, 2019 · 10 comments
Labels
help wanted Extra attention is needed

Comments

@wannaphong
Copy link
Member

I can't build flite voices. :( google/language-resources#31

@wannaphong wannaphong added the help wanted Extra attention is needed label Nov 1, 2019
@Shallowmallow
Copy link

Hi !
I'm actually maintaining an android version of flite on google play.
And I try to find all the festival languages ( and do so some too) and convert them so it can be accessible to everyone.
If you want I can adapt your voice to flite and tell you how I did it , once done.
But I didn't find the wavs on github

@wannaphong
Copy link
Member Author

@Shallowmallow
Copy link

Great. Thanks was just able to download it.

@Shallowmallow
Copy link

Oh I looked at your files and saw you were only using a lexicon and not some rules for unknown words. Are there easy to rules to convert an unknown word to their phonemes ? :)

@wannaphong
Copy link
Member Author

wannaphong commented Feb 10, 2020

@Shallowmallow You may have to convert unknown words to IPA first. I have a problem not being able to convert such unknown words as well. Thai language, we don't have enough public dataset to do G2P with a deep learning.

@wannaphong
Copy link
Member Author

wannaphong commented Feb 15, 2020

@Shallowmallow
Copy link

Hi !
@wannaphong
So the words only came from a dictionary ? Is thai easy to read ? Do you think the rules are simple to make , without using any deep learning.

I'm trying two things :
- doing the thai with festival grapheme scripts. It will resolve the rules problem.
For now, it works well when I use 200, 500 sentences but when I do all 2000 thousand sentences, there are horrible parasite sounds. So I'm testing 1000 , then more.
- converting the actual voice to flite. I can convert easily the voice to flite but it talks gibberish as I have some problems with converting the dictionary. ( the first time I'm doing this with pure unicode :).
But I actually think , there is another problem, I haven't checked throroughly yet. THere may be a problem because of the phonemes name ( specifically those ending with numbers). Flite usually converts stress to numbers. So for example (( b e t) 1) will become b e1 t .

PS : I remember you saying you had a voice that wasn't using unicode. Do you still have the txt.done.data for this voice ?

I wanted to test the festival grapheme conversion

@wannaphong
Copy link
Member Author

Hi !
@wannaphong
So the words only came from a dictionary ? Is thai easy to read ? Do you think the rules are simple to make , without using any deep learning.

I'm trying two things :

  • doing the thai with festival grapheme scripts. It will resolve the rules problem.
    For now, it works well when I use 200, 500 sentences but when I do all 2000 thousand sentences, there are horrible parasite sounds. So I'm testing 1000 , then more.
  • converting the actual voice to flite. I can convert easily the voice to flite but it talks gibberish as I have some problems with converting the dictionary. ( the first time I'm doing this with pure unicode :).
    But I actually think , there is another problem, I haven't checked throroughly yet. THere may be a problem because of the phonemes name ( specifically those ending with numbers). Flite usually converts stress to numbers. So for example (( b e t) 1) will become b e1 t .

PS : I remember you saying you had a voice that wasn't using unicode. Do you still have the txt.done.data for this voice ?

I wanted to test the festival grapheme conversion

txt.done.data file in thaitts2.zip. https://drive.google.com/file/d/1i5z8yjXKBDzfc3zbaJJhg9L2hE5f4byC/view?usp=sharing

@Shallowmallow
Copy link

I tried making a festival grapheme voicee http://www.festvox.org/festvox/c3485.html .
Because like this there would be some rules.
It uses this converter https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/scripttranscriber/Unitran/Tables/Thai_unicode.txt
For some reason , the clustergen didn't work well ( there were some noises times to times when I did more than 500 sentences), so I tried to do a HTS voice.
But the sentences are not really correct ( I think).
Festival_grapheme is used here http://festvox.org/cmu_wilderness/
But the thai doesn't seem correct . It has a distortion measure MCD of 6.98 . Can you check if you understand the sentences on the website ?

I think the problem with the grapheme voice and with your dictionary is that you consider a tone like a phoneme.

It appears okay with your voice , but only because the words appear exactly as they have been learned.

But logically, it should be a problem.
Because the sound will be cut into phonemes.

So the sound for example à which is a 4

will be cut like this
|a|__4|

instead of
|a4_______]

so basically if you prononce ì it will do
|i|__4|
except that 4 sounds like the end of a so it will be kind of ià.

I see 2 possibilities to correct it

  • don't consider a tone a phoneme
  • consider it a phoneme but force it's duration to 0 . Hmm I wonder if you can do it easily with festival.

Do you plan to use the voice even if doesn't rules ro read unknown word ?

@wannaphong
Copy link
Member Author

@Shallowmallow I was thinking of use g2p and use SayPhone on festival.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants