We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Following the code from https://trankit.readthedocs.io/en/latest/training.html#training-a-lemmatizer i get a KeyError: 'lemma':
Setting up training config... Initialized lemmatizer trainer Training dictionary-based lemmatizer --------------------------------------------------------------------------- KeyError Traceback (most recent call last) [<ipython-input-9-a90867cc5ef3>](https://localhost:8080/#) in <module>() 11 12 # start training ---> 13 trainer.train() 3 frames [/content/trankit/trankit/tpipeline.py](https://localhost:8080/#) in train(self) 680 self._train_posdep() 681 elif self._task == 'lemmatize': --> 682 self._train_lemma() 683 elif self._task == 'ner': 684 self._train_ner() [/content/trankit/trankit/tpipeline.py](https://localhost:8080/#) in _train_lemma(self) 581 582 def _train_lemma(self): --> 583 self._lemma_model.train() 584 585 def _train_ner(self): [/content/trankit/trankit/models/lemma_model.py](https://localhost:8080/#) in train(self) 379 self.config.logger.info("Training dictionary-based lemmatizer") 380 self.trainer.train_dict( --> 381 [[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if 382 not ( 383 type(token[ID]) == tuple and len(token[ID]) == 2)]) [/content/trankit/trankit/models/lemma_model.py](https://localhost:8080/#) in <listcomp>(.0) 381 [[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if 382 not ( --> 383 type(token[ID]) == tuple and len(token[ID]) == 2)]) 384 dev_preds = self.trainer.predict_dict( 385 [[token[TEXT], token[UPOS]] for sentence in self.dev_batch.doc for token in sentence if KeyError: 'lemma'
The recent version from https://github.com/UniversalDependencies/UD_Thai-PUD is used as trainings and development data.
The text was updated successfully, but these errors were encountered:
There are no Lemmas in the training data. So there can't be lemmatizer?! Can't i use the the other parts of the pipeline? When i run
from trankit import Pipeline p = Pipeline(lang='customized', cache_dir='./save_dir')
the following error occurs:
BadZipFile: File is not a zip file
Sorry, something went wrong.
No branches or pull requests
Following the code from https://trankit.readthedocs.io/en/latest/training.html#training-a-lemmatizer i get a KeyError: 'lemma':
The recent version from https://github.com/UniversalDependencies/UD_Thai-PUD is used as trainings and development data.
The text was updated successfully, but these errors were encountered: