Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KiwiTokenizer 사용시 발생하는 에러에 대한 질문입니다! #1

Closed
y2pe opened this issue Dec 17, 2023 · 1 comment
Closed

Comments

@y2pe
Copy link

y2pe commented Dec 17, 2023

초보자가 'KiwiTokenizer' 한번 사용해볼려고 제시한 예제대로 해봤는데 다음과 같은 에러가 나네요. 그리고 KiwiTokenizer가 들어간 kiwi-farm에서 제시한 예제 코드나 그 외 다른 코드라던지 전부 'KiwiTokenizer' 때문에 에러가 발생하던데요?(혹시 잘못 알고 있다면 지적 바랍니다)

import kiwipiepy
from kiwipiepy import KiwiTokenizer

ImportError Traceback (most recent call last)
Cell In[3], line 2
1 import kiwipiepy
----> 2 from kiwipiepy import KiwiTokenizer

ImportError: cannot import name 'KiwiTokenizer' from 'kiwipiepy' (D:\projects\kiwi\kiwipiepy_init_.py)

from datasets import load_dataset
import os
from kiwipiepy import Kiwi
from kiwipiepy.utils import Stopwords

data_dir = './data'
dataset = load_dataset('nsmc')
os.makedirs(data_dir, exist_ok=True)
for split_key in dataset.keys():
doc_path = f"{data_dir}/{split_key}.txt"
with open(doc_path, 'w', encoding='utf-8') as f:
for doc in dataset[split_key]['document']:
f.write(doc + '\n')

kiwi = Kiwi()
stopwords = Stopwords()

ImportError Traceback (most recent call last)
Cell In[3], line 3
1 from datasets import load_dataset
2 import os
----> 3 from kiwipiepy import Kiwi
4 from kiwipiepy.utils import Stopwords
6 data_dir = './data'

File D:\projects\kiwi\kiwipiepy_init_.py:7
5 from kiwipiepy._version import version
6 from kiwipiepy._wrap import Kiwi, Sentence, TypoTransformer, TypoDefinition, HSDataset, MorphemeSet, PretokenizedToken
----> 7 import kiwipiepy.sw_tokenizer as sw_tokenizer
8 import kiwipiepy.utils as utils
9 from kiwipiepy.const import Match

File D:\projects\kiwi\kiwipiepy\sw_tokenizer.py:15
11 import warnings
13 import tqdm
---> 15 from _kiwipiepy import Sw_Tokenizer
17 from kiwipiepy import Kiwi, Token
19 @DataClass
20 class SwTokenizerConfig:

ImportError: cannot import name 'Sw_Tokenizer' from '_kiwipiepy' (D:\anaconda3\Lib\site-packages_kiwipiepy.cp311-win_amd64.pyd)

  1. <= 이렇게 하는 코드는 원래 안되는 것인가요?
    from transformers import AutoModelForMaskedLM
    from kiwipiepy.sw_tokenizer import SwTokenizer
    import kiwipiepy.transformers_addon
    from kiwipiepy.sw_tokenizer import KiwiTokenizer

tokenizer = KiwiTokenizer.from_pretrained('kiwi-farm/roberta-base-32k')
#tokenizer = SwTokenizer.KiwiTokenizer("kiwi-farm/roberta-base-32k")
model = AutoModelForMaskedLM.from_pretrained('kiwi-farm/roberta-base-32k')
model.config.is_decoder = True # Set the model as a decoder

prompt = "Gim"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

Generate text using top-k sampling

gen_ids = model.generate(
input_ids,
max_length=50,
do_sample=True, # Enable sampling
top_k=50 # Top-k sampling
)

generated_text = tokenizer.decode(gen_ids[0], skip_special_tokens=True)
print(generated_text)

ImportError Traceback (most recent call last)
Cell In[1], line 2
1 from transformers import AutoModelForMaskedLM
----> 2 from kiwipiepy.sw_tokenizer import SwTokenizer
3 import kiwipiepy.transformers_addon
4 from kiwipiepy.sw_tokenizer import KiwiTokenizer

File D:\projects\kiwi\kiwipiepy_init_.py:7
5 from kiwipiepy._version import version
6 from kiwipiepy._wrap import Kiwi, Sentence, TypoTransformer, TypoDefinition, HSDataset, MorphemeSet, PretokenizedToken
----> 7 import kiwipiepy.sw_tokenizer as sw_tokenizer
8 import kiwipiepy.utils as utils
9 from kiwipiepy.const import Match

File D:\projects\kiwi\kiwipiepy\sw_tokenizer.py:15
11 import warnings
13 import tqdm
---> 15 from _kiwipiepy import Sw_Tokenizer
17 from kiwipiepy import Kiwi, Token
19 @DataClass
20 class SwTokenizerConfig:

ImportError: cannot import name 'Sw_Tokenizer' from '_kiwipiepy' (D:\anaconda3\Lib\site-packages_kiwipiepy.cp311-win_amd64.pyd)

@bab2min
Copy link
Owner

bab2min commented Dec 17, 2023

안녕하세요 @y2pe
먼저 KiwiTokenizer 클래스는 kiwipiepy가 아니라 kiwipiepy.transformers_addon 패키지에 포함되어있으므로 이를 import하시려면 다음처럼 해야합니다.

from kiwipiepy.transformers_addon import KiwiTokenizer

그리고 혹시 사용중인 kiwipiepy와 transformers 버전이 어떻게 될까요?

공유해주신 에러 메세지를 보면 D:\projects\kiwi\kiwipiepy\sw_tokenizer.py:15 코드가

from _kiwipiepy import Sw_Tokenizer

라고 되어있는데, 공식 kiwipiepy의 코드는
https://github.com/bab2min/kiwipiepy/blob/1b0b9c6f025feefcc4b87df19904cfec464c736a/kiwipiepy/sw_tokenizer.py#L15
와 같거든요.
혹시 설치한 라이브러리 버전이 꼬이거나 내부 코드가 손상된게 아닌가 싶은데요 이 부분 확인해주시면 감사드리겠습니다.

아니면 다음 버전으로 패키지를 설치하신 후 아래 테스트 코드를 수행해보시길 권장합니다.

kiwipiepy==0.16.2
transformers==4.28.0
from transformers import (
    AutoTokenizer, 
    AutoModelForMaskedLM, 
)
import kiwipiepy.transformers_addon

tokenizer = AutoTokenizer.from_pretrained('kiwi-farm/roberta-base-32k')
model = AutoModelForMaskedLM.from_pretrained('kiwi-farm/roberta-base-32k')

@bab2min bab2min closed this as completed Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants