-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream data, jyutping
should not be an array.
#4
Comments
𠯢 U+E064 saa1 aa6
〺 U+5345 saa1 aa6
卌 U+534C sei3 aa6 For the above three words, quote from JPTableFull (U+E064 is missing but I believe it is a mistake), in 4.9.2:
Thus I believe that they are actually multi-syllable. 籿 U+7C7F fan1 mai5
粀 U+7C80 sap6 mai5
粁 U+7C81 cin1 mai5
粌 U+7C8C baak3 mai5
粍 U+7C8D hou4 mai5
粨 U+7CA8 baak3 mai5
糎 U+7CCE lei4 mai5 For the words above, I am not sure what is the right interpretation, if I have to guess I think mai5 is talking about the radical (米). 㖊 U+358A jing1 cam4
吋 U+540B jing1 cyun3
呎 U+544E jing1 cek3
哩 U+54E9 jing1 lei5
唡 U+5521 jing1 loeng2
啢 U+5562 jing1 loeng2
噚 U+565A jing1 cam4
𠺖 U+F45A jing1 mau5
𠰴 U+F4C0 jing1 sek6 For the words above, those cases are interesting because the first jing1 I believe is talking about the word (英). |
Here is what I’ve got from the linguist: “The sap6 sing1 and baak3 mai5 ones are units of measurements. 竓 = 毫升 hou sing = millilitre.” Looking again, lshk-org/jyutping-table#3 (comment)
I’m willing to bet that Further, there could be alternate characters included for when people would choose a different character to represent the contraction, which might explain the thing I was treating as a data issue. For example:
These are also weird ones: 浬 U+6D6C hoi2 lei5
粴 U+7CB4 gung1 lei5
嗧 U+55E7 gaa1 leon4 So, we could be looking at archaic contraction pronunciations that didn’t survive to modern Cantonese. |
浬 U+6D6C hoi2 lei5
粴 U+7CB4 gung1 lei5
嗧 U+55E7 gaa1 leon4 These three words are all units of measurements too 浬 U+6D6C 海里
粴 U+7CB4 公里
嗧 U+55E7 加侖 |
If I have to guess for "米-X" words now, it should have the meaning of "X-meters" |
Okay, so I think this is "mystery solved." They're archaic contractions that didn't make it to modern Cantonese. Some of them may be bad handwriting or bad transcriptions (e.g. Given that, I think we're looking at needing to treat all of them as archaic multi-syllable characters. |
I've spent today reviewing https://github.com/lshk-org/jyutping-table and I believe that there are issues in the upstream source data.
First, I believe that
jyutping
should be a single field, not an array. All of the items which appear as an array are listed here:lshk-org/jyutping-table#3 (comment)
With the exceptions of these, which I believe are actually multi-syllable:
I believe that the remainder of the list should be interpreted as this (does not include the entire list):
Of the full list, only 6 are marked for having different phonetics, all of which match the second Jyutping component.
Given the characters' construction matching the description field, a conversation with a native-speaker, and a conversation with a non-native speaker Cantonese linguist (who also consulted a native speaker), I believe:
Further, it's also possible that the descriptor field pronunciation is wrong in a couple of cases, as I mention here: lshk-org/jyutping-table#4.
I propose that the object shape for this library be modified to account for this finding and post-processing added to adjust the data for correctness.
The text was updated successfully, but these errors were encountered: