WS coerce_dictionary parameter do not shared with NER #21

leungsolomon · 2020-02-06T04:16:50Z

NER 不能用 WS coerce_dictionary 去分詞, 有沒有 fix / work around ?

Nb | 專有名詞
Nc | 地方詞

input (from README example)

sentence_list = ['瑞士 LAURASTAR S4a 熨燙護理系統', ....]

word_to_weight = {
    "瑞士 LAURASTAR": 1,
    }

# ws
word_sentence_list = ws(
    sentence_list,
    coerce_dictionary = dictionary1, 
)

# pos
pos_sentence_list = pos(word_sentence_list)

# ner
entity_sentence_list = ner(word_sentence_list, pos_sentence_list)

# Print result
print(word_sentence_list[1], pos_sentence_list[1])
for i, sentence in enumerate(sentence_list):
    print()
    print(f"'{sentence}'")
    print_word_pos_sentence(word_sentence_list[i],  pos_sentence_list[i])
    for entity in sorted(entity_sentence_list[i]):
        print(entity)

output

# without coerce_dictionary parameter
'瑞士 LAURASTAR S4a 熨燙護理系統'
['瑞士(Nc)', ' LAURASTAR S4(FW)', 'a (FW)', '熨燙(VC)', '護理(Na)', '系統(Na)']
(0, 2, 'PERSON', '瑞士')

# with coerce_dictionary parameter
'瑞士 LAURASTAR S4a 熨燙護理系統'
瑞士 LAURASTAR(Nb)　 S4(FW)　a (FW)　熨燙(VC)　護理(Na)　系統(Na)　
(0, 2, 'PERSON', '瑞士')

The text was updated successfully, but these errors were encountered:

jacobvsdanniel · 2020-03-31T09:43:59Z

dictionary 的加入可以客製化分詞結果。不過目前的架構下 NER 僅將 WS 的結果作為參考，NER 辨識出的實體邊界不一定是 WS 辨識出的分詞邊界。

jacobvsdanniel added the good first issue Good for newcomers label Mar 31, 2020

jacobvsdanniel mentioned this issue Apr 10, 2022

請問利用自定義字典影響斷詞結果，為什麼不會影響實體辨識結果呢？ #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WS coerce_dictionary parameter do not shared with NER #21

WS coerce_dictionary parameter do not shared with NER #21

leungsolomon commented Feb 6, 2020

jacobvsdanniel commented Mar 31, 2020

WS coerce_dictionary parameter do not shared with NER #21

WS coerce_dictionary parameter do not shared with NER #21

Comments

leungsolomon commented Feb 6, 2020

jacobvsdanniel commented Mar 31, 2020