Data gathering #19

hugolpz · 2017-03-16T11:03:22Z

We currently look for database with { "glyph": "西", "phonetic": "xī" } (or xi1, or alternatives).

Sources possible, info to complete :

Moedict

link (to complete)
json format
range : most common caracters, trad only ?

Unicode :

link (to complete)
xml
range : traditional/modern ; -more complete for a font
which phonetic format its provided also. ("glyph": "西", "phonetic": "xī" or xi1 ?)

CJKlib

link

The text was updated successfully, but these errors were encountered:

edouard-lopez · 2017-03-16T16:47:19Z

What about Unihan?

With the hexadecimal codepoint we can get the glyph like this in Python:

>>> print(chr(int('0x897F', 16)))
西

A JS solution would be better, but this is out of the scope of the project, we can do it anyway we think fits.

hugolpz · 2017-03-16T16:55:34Z

Please check out :

unihan results
-- npm unihan
-- npm convertPinyin
-- npm unihan-cjk

PinNum2PinTones
-- npm pinyinize
-- pinyin-string -- best one !

edouard-lopez · 2017-03-16T17:18:32Z

Thanks for the link cjk-unihan might be useful for other projects.

I think it's better to limit the project to generating font and outsource the data gathering/validation to another project. This way we stay focus and efficient.

I'm closing as different users might have different needs hence handcraft their dictionaries.

edouard-lopez · 2017-03-16T17:20:27Z

I reckon the JS solution is in tobei/unihan code

const character = String.fromCodePoint(parseInt(code.substring(2), 16));

hugolpz · 2017-03-16T17:53:30Z

Did you gathered the data ?

edouard-lopez · 2017-03-17T08:59:44Z

Not yet, could you work on a project to do so?

hugolpz · 2017-03-17T10:16:56Z

Yup. See also peterolson/hanzi-tools#1 (comment)

edouard-lopez · 2017-03-17T13:08:00Z

@hugolpz I think you have a typo in your comment, there is a ratio of 1:10 between node-pinyin and unihan characters/phonetic pairs. Can you confirm/correct this number?

hugolpz · 2018-02-11T21:24:55Z

https://github.com/superbiger/pinyin4js/blob/master/src/dict/pinyin.dict.js

edouard-lopez · 2018-02-12T13:52:29Z

We can get the codepoint using punycode

edouard-lopez closed this as completed Mar 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data gathering #19

Data gathering #19

hugolpz commented Mar 16, 2017 •

edited

Loading

edouard-lopez commented Mar 16, 2017 •

edited

Loading

hugolpz commented Mar 16, 2017 •

edited

Loading

edouard-lopez commented Mar 16, 2017

edouard-lopez commented Mar 16, 2017

hugolpz commented Mar 16, 2017

edouard-lopez commented Mar 17, 2017

hugolpz commented Mar 17, 2017

edouard-lopez commented Mar 17, 2017

hugolpz commented Feb 11, 2018

edouard-lopez commented Feb 12, 2018

Data gathering #19

Data gathering #19

Comments

hugolpz commented Mar 16, 2017 • edited Loading

Moedict

Unicode :

CJKlib

edouard-lopez commented Mar 16, 2017 • edited Loading

hugolpz commented Mar 16, 2017 • edited Loading

edouard-lopez commented Mar 16, 2017

edouard-lopez commented Mar 16, 2017

hugolpz commented Mar 16, 2017

edouard-lopez commented Mar 17, 2017

hugolpz commented Mar 17, 2017

edouard-lopez commented Mar 17, 2017

hugolpz commented Feb 11, 2018

edouard-lopez commented Feb 12, 2018

hugolpz commented Mar 16, 2017 •

edited

Loading

edouard-lopez commented Mar 16, 2017 •

edited

Loading

hugolpz commented Mar 16, 2017 •

edited

Loading