You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I installed DaAnonymization with pip a week ago an tried to run your example from the readme, but it fails becaue of some mismatch between DaCy Large and current spacy version.
What I Did
The script anon_test.py:
from textprivacy import TextAnonymizer
# list of texts (example with cross-lingual transfer to english)
corpus = [
"Hej, jeg hedder Martin Jespersen og er fra Danmark og arbejder i "
"Deloitte, mit cpr er 010203-2010, telefon: +4545454545 "
"og email: martin.martin@gmail.com",
"Hi, my name is Martin Jespersen and work in Deloitte. "
"I used to be a PhD. at DTU in Machine Learning and B-cell immunoinformatics "
"at Anker Engelunds Vej 1 Bygning 101A, 2800 Kgs. Lyngby.",
]
Anonymizer = TextAnonymizer(corpus)
# Anonymize person, location, organization, emails, CPR and telephone numbers
anonymized_corpus = Anonymizer.mask_corpus()
for text in anonymized_corpus:
print(text)
(anon): ~$ /home/akirkedal/software/anaconda/envs/anon/bin/python /home/akirkedal/software/anon/anon_test.py
/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy/util.py:762: UserWarning: [W095] Model 'da_dacy_large_tft' (0.0.0) was trained with spaCy v3.0 and may not be 100% compatible with the current version (3.1.4). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
Traceback (most recent call last):
File "/home/akirkedal/software/anon/anon_test.py", line 1, in <module>
from textprivacy import TextAnonymizer
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/textprivacy/__init__.py", line 7, in <module>
from textprivacy.textanonymization import TextAnonymizer
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/textprivacy/textanonymization.py", line 34, in <module>
ner_model = dacy.load("da_dacy_large_tft-0.0.0")
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/dacy/load.py", line 39, in load
return spacy.load(path)
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy/__init__.py", line 51, in load
return util.load_model(
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy/util.py", line 351, in load_model
return load_model_from_path(Path(name), **kwargs) # type: ignore[arg-type]
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy/util.py", line 418, in load_model_from_path
return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy/language.py", line 2021, in from_disk
util.from_disk(path, deserializers, exclude) # type: ignore[arg-type]
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy/util.py", line 1229, in from_disk
reader(path / key)
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy/language.py", line 2015, in <lambda>
deserializers[name] = lambda p, proc=proc: proc.from_disk( # type: ignore[misc]
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy_transformers/pipeline_component.py", line 402, in from_disk
util.from_disk(path, deserialize, exclude)
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy/util.py", line 1229, in from_disk
reader(path / key)
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy_transformers/pipeline_component.py", line 391, in load_model
tokenizer, transformer = huggingface_from_pretrained(
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/spacy_transformers/util.py", line 31, in huggingface_from_pretrained
tokenizer = AutoTokenizer.from_pretrained(str_path, **config)
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 568, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1732, in from_pretrained
return cls._from_pretrained(
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1850, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py", line 134, in __init__
super().__init__(
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 110, in __init__
fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 829, in convert_slow_tokenizer
return converter_class(transformer_tokenizer).converted()
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 375, in __init__
from .utils import sentencepiece_model_pb2 as model_pb2
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/transformers/utils/sentencepiece_model_pb2.py", line 52, in <module>
_descriptor.EnumValueDescriptor(name="UNIGRAM", index=0, number=1, options=None, type=None),
File "/home/akirkedal/software/anaconda/envs/anon/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 755, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates```
The text was updated successfully, but these errors were encountered:
I believe it was simply dacy which changed a bit in its installation. I tried in on my mac and published a 0.1.1 version @dresen. Please let me know if it solved it for you!
Description
I installed DaAnonymization with pip a week ago an tried to run your example from the readme, but it fails becaue of some mismatch between DaCy Large and current spacy version.
What I Did
The script anon_test.py:
The text was updated successfully, but these errors were encountered: