Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WS coerce_dictionary parameter do not shared with NER #21

Open
leungsolomon opened this issue Feb 6, 2020 · 1 comment
Open

WS coerce_dictionary parameter do not shared with NER #21

leungsolomon opened this issue Feb 6, 2020 · 1 comment
Labels
good first issue Good for newcomers

Comments

@leungsolomon
Copy link

NER 不能用 WS coerce_dictionary 去分詞, 有沒有 fix / work around ?

Nb | 專有名詞
Nc | 地方詞

input (from README example)

sentence_list = ['瑞士 LAURASTAR S4a 熨燙護理系統', ....]

word_to_weight = {
    "瑞士 LAURASTAR": 1,
    }

# ws
word_sentence_list = ws(
    sentence_list,
    coerce_dictionary = dictionary1, 
)

# pos
pos_sentence_list = pos(word_sentence_list)

# ner
entity_sentence_list = ner(word_sentence_list, pos_sentence_list)

# Print result
print(word_sentence_list[1], pos_sentence_list[1])
for i, sentence in enumerate(sentence_list):
    print()
    print(f"'{sentence}'")
    print_word_pos_sentence(word_sentence_list[i],  pos_sentence_list[i])
    for entity in sorted(entity_sentence_list[i]):
        print(entity)

output

# without coerce_dictionary parameter
'瑞士 LAURASTAR S4a 熨燙護理系統'
['瑞士(Nc)', ' LAURASTAR S4(FW)', 'a (FW)', '熨燙(VC)', '護理(Na)', '系統(Na)']
(0, 2, 'PERSON', '瑞士')

# with coerce_dictionary parameter
'瑞士 LAURASTAR S4a 熨燙護理系統'
瑞士 LAURASTAR(Nb)  S4(FW) a (FW) 熨燙(VC) 護理(Na) 系統(Na) 
(0, 2, 'PERSON', '瑞士')
@jacobvsdanniel
Copy link
Collaborator

dictionary 的加入可以客製化分詞結果。不過目前的架構下 NER 僅將 WS 的結果作為參考,NER 辨識出的實體邊界不一定是 WS 辨識出的分詞邊界。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants