-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialization of Type Words #30
Comments
Thanks for your attention. The choice of initialization has some influence on the final model performance, but the impact is not significant. To establish a stronger correspondence between the type words and the specific task at hand, it is necessary to adapt the initialization based on the characteristics of the dataset. |
Got it. So, did you use a different set of initialization while training the semeval dataset or the same ones? |
The same one. |
hi buddy, do you have any further questions? |
I had a question regarding the initialization of type words. According to the code:
if self.args.init_type_words:
so_word = [a[0] for a in self.tokenizer(["[obj]","[sub]"], add_special_tokens=False)['input_ids']]
meaning_word = [a[0] for a in self.tokenizer(["person","organization", "location", "date", "country"], add_special_tokens=False)['input_ids']]
The meaning words are initialized with certain entity types. While these are the probable entity types for the TACRED dataset, the same is not true for the SemEval dataset.
I wanted to know how this initialization affects the working of the algorithm on other datasets like SemEval. Should we change this initialization based on the dataset?
The text was updated successfully, but these errors were encountered: