Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'lemma' #48

Open
Bachstelze opened this issue May 26, 2022 · 1 comment
Open

KeyError: 'lemma' #48

Bachstelze opened this issue May 26, 2022 · 1 comment

Comments

@Bachstelze
Copy link

Following the code from https://trankit.readthedocs.io/en/latest/training.html#training-a-lemmatizer i get a KeyError: 'lemma':

Setting up training config...
Initialized lemmatizer trainer
Training dictionary-based lemmatizer

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

[<ipython-input-9-a90867cc5ef3>](https://localhost:8080/#) in <module>()
     11 
     12 # start training
---> 13 trainer.train()

3 frames

[/content/trankit/trankit/tpipeline.py](https://localhost:8080/#) in train(self)
    680             self._train_posdep()
    681         elif self._task == 'lemmatize':
--> 682             self._train_lemma()
    683         elif self._task == 'ner':
    684             self._train_ner()

[/content/trankit/trankit/tpipeline.py](https://localhost:8080/#) in _train_lemma(self)
    581 
    582     def _train_lemma(self):
--> 583         self._lemma_model.train()
    584 
    585     def _train_ner(self):

[/content/trankit/trankit/models/lemma_model.py](https://localhost:8080/#) in train(self)
    379             self.config.logger.info("Training dictionary-based lemmatizer")
    380             self.trainer.train_dict(
--> 381                 [[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if
    382                  not (
    383                          type(token[ID]) == tuple and len(token[ID]) == 2)])

[/content/trankit/trankit/models/lemma_model.py](https://localhost:8080/#) in <listcomp>(.0)
    381                 [[token[TEXT], token[UPOS], token[LEMMA]] for sentence in self.train_batch.doc for token in sentence if
    382                  not (
--> 383                          type(token[ID]) == tuple and len(token[ID]) == 2)])
    384             dev_preds = self.trainer.predict_dict(
    385                 [[token[TEXT], token[UPOS]] for sentence in self.dev_batch.doc for token in sentence if

KeyError: 'lemma'

The recent version from https://github.com/UniversalDependencies/UD_Thai-PUD is used as trainings and development data.

@Bachstelze
Copy link
Author

There are no Lemmas in the training data. So there can't be lemmatizer?! Can't i use the the other parts of the pipeline?
When i run

from trankit import Pipeline
p = Pipeline(lang='customized', cache_dir='./save_dir')

the following error occurs:

BadZipFile: File is not a zip file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant