Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we use our own data set to train the models and predict our own test set? #14

Open
xinxu1018 opened this issue Oct 17, 2018 · 15 comments

Comments

@xinxu1018
Copy link

Can we use our own data set to train the models and predict our own test set?

@Sshanu
Copy link
Owner

Sshanu commented Oct 17, 2018 via email

@xinxu1018
Copy link
Author

@Sshanu Thanks so much for your quick response! Please allow me to ask one more question.
Since I am using word embeddings trained over my specific corpus instead of your given Glove embeddings, how can I get my embeddings in the same format with the Glove embedding file you provided in the data folder and use it in your designed LSTM model?

All the best!

@Sshanu
Copy link
Owner

Sshanu commented Oct 17, 2018 via email

@xinxu1018
Copy link
Author

@xinxu1018 That is so informative! Thanks a lot!

@xinxu1018
Copy link
Author

@Sshanu It works! Thanks a lot! Can I ask a follow-up question again?
If I wanna classify relations between multi-word terms (in your case it is one-word term pairs), how can I preprocess the sentences before I go to the step of dependency path extraction? Do you have any suggestions? One way I am considering is to connect every word within a multi-word term using underscores (like, "system configuration" to "system_configuration" ) and then treat them as a one-word term. Then follow your designed procedures. Not sure if it will work. Do you have any ideas?

Many thanks!

@Sshanu
Copy link
Owner

Sshanu commented Oct 18, 2018 via email

@xinxu1018
Copy link
Author

@Sshanu Thanks a lot! Hope you everything goes very well!

@xinxu1018
Copy link
Author

@Sshanu Could you please provide your word_embd_wiki file? I cannot find the embedding file in your given data folder. Thanks for you help!

Best,

@Sshanu
Copy link
Owner

Sshanu commented Oct 18, 2018 via email

@xinxu1018
Copy link
Author

@Sshanu How can I share a folder with you? What's your address? Sorry, I am new here!

@xinxu1018
Copy link
Author

@Sshanu Hi Sshanu, I just shared a google drive folder to the email you provided in your Github profile. Not sure am I doing right! Many thanks!

@Sshanu
Copy link
Owner

Sshanu commented Oct 19, 2018 via email

@xinxu1018
Copy link
Author

@Sshanu I have shared to your gmail. Please check and many thanks!

@xinxu1018
Copy link
Author

@Sshanu Hi Sshanu, got your shared file! You helped me a lot! I am just wondering do you have the original code that was used to separate embedding file into vocab and word_embedding arrays? Then I can generate my own trained embeddings into the format aligned with your designed method. Could you please share me the code? Thanks again!

@Sshanu
Copy link
Owner

Sshanu commented Oct 19, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants