-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: SwTokenizer getstate #136
Conversation
안녕하세요 @Bing-su 님, |
https://docs.python.org/ko/3.11/library/pickle.html?highlight=pickle#object.__setstate__
(kiwi)
kiwipiepy on pickle via △ v3.27.0 via 🐍 v3.10.12 via 🅒 kiwi took 2s
❯ python .\test\test_transformers_addon.py
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
(kiwi)
kiwipiepy on pickle via △ v3.27.0 via 🐍 v3.10.12 via 🅒 kiwi took 3s
❯ 그리고 피클 라이브러리들로 피클화한 뒤, 비교해보는 테스트를 진행해보았습니다. import pickle
import dill
import cloudpickle
import kiwipiepy.transformers_addon
from transformers import AutoTokenizer
repo = "kiwi-farm/roberta-base-32k"
orig = AutoTokenizer.from_pretrained(repo)
with open("pk1.pkl", "wb") as f:
pickle.dump(orig, f)
with open("pk2.pkl", "wb") as f:
dill.dump(orig, f)
with open("pk3.pkl", "wb") as f:
cloudpickle.dump(orig, f) from itertools import permutations
with open("pk1.pkl", "rb") as f:
upk1 = pickle.load(f)
with open("pk2.pkl", "rb") as f:
upk2 = dill.load(f)
with open("pk3.pkl", "rb") as f:
upk3 = cloudpickle.load(f)
for (tk1, tk2) in permutations([orig, upk1, upk2, upk3], 2):
for (k, v1), (_, v2) in zip(tk1.__dict__.items(), tk2.__dict__.items()):
if k != "_tokenizer":
assert getattr(tk1, k) == getattr(tk2, k)
else:
assert vars(getattr(tk1, k)) == vars(getattr(tk2, k))
print("ok!")
|
@Bing-su property만 찍어보면 정상적으로 작동하는 것처럼 보일 수 있지만, 내부의 c++로 구현된 object를 호출하는 부분이 연결되면 아마 오류가 뜰 것으로 예상되어서요. test에서 예상대로 unpickle후 kiwi를 사용하는 부분에서 segmentation fault가 발생하고 있습니다. c++단에서
|
말씀하신게 맞습니다. 더 테스트를 해보고 다시 찾아오겠습니다. 감사합니다. |
fixes: #135
https://docs.python.org/ko/3.11/library/pickle.html?highlight=pickle#pickling-class-instances
python 3.11부터는 __getstate__가 정의되어있지 않을때의 기본 동작을 정의함으로써 이 문제를 해결한 것으로 보입니다.python 3.11에서도 같은 에러 발생python 3.10이하에서는 여전히 필요합니다.