Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom preprocessing in Live Test #3

Closed
saurabhsbora opened this issue May 4, 2020 · 8 comments
Closed

Custom preprocessing in Live Test #3

saurabhsbora opened this issue May 4, 2020 · 8 comments
Labels
enhancement New feature or request

Comments

@saurabhsbora
Copy link

saurabhsbora commented May 4, 2020

@sergioburdisso
It would be a great feature to have custom preprocessing in the Live Test.
This will enable us to visually understand the words, sentences, and paragraphs that helped the model to classify a particular document after custom preprocessing.

@sergioburdisso sergioburdisso added the enhancement New feature or request label May 4, 2020
sergioburdisso added a commit that referenced this issue May 5, 2020
The method used to recognize/parse learned word n-grams was improved.
The new code it is easier to follow and clearer, and semantically more
accurate. Furthermore, single words and word n-grams tokanization and
visualization in the Live Test Tool was also improved by this new code.

Before this modification, there was a bug in the Live Test tool raised
by the addition of the custom preprocessing support (#3). Some words
were visualized in the wrong way, especially those ended with question
marks, for instance the sentence "Great  atmosphere then ?????". In
addition, now the user can put an arbitrarily number of spaces between
words and they will be ignored when recognizing word n-grams. For
instance, before this modification, if the user entered:
"machine     learning"
It was not recognized as a bigram due to those extra spaces.
sergioburdisso added a commit that referenced this issue May 5, 2020
Added:

  - The Live Test Tool now supports custom (user-defined) preprosessing methods (b50cfaf, resolved #3).

  - The tokenization process was improved (26fff88, 4af8e80).

  - The process for recognizing word n-grams during classification was improved (2ceb148).
sergioburdisso added a commit that referenced this issue May 5, 2020
Added:

  - The Live Test Tool now supports custom (user-defined) preprosessing
    methods (b50cfaf, resolved #3).

  - The tokenization process was improved (26fff88, 4af8e80).

  - The process for recognizing word n-grams during classification was
    improved (2ceb148).
@sergioburdisso
Copy link
Owner

sergioburdisso commented May 5, 2020

Hi @enthussb!
Sure, I overlooked this option when first coding the Live Test tool. Thank you for your suggestion :)

I've added this feature in the new version, and also took the opportunity to incorporate some other things that were pending, namely, what's new on this version is:

Update your package version using the pip install -U pyss3 to the new version (0.5.8). To make things easier for you, I've created a new Jupyter Notebook in the examples folder in which is shown how to work incorporate user-defined preprocessing functions to the Live Test tool visualizations for you to follow if you want: using_custom_preprocessing.ipynb

Let me know if everything worked OK ☕

@sergioburdisso
Copy link
Owner

@all-contributors would you add @enthussb for ideas to the README file? it helped to make this project better by suggesting this cool feature 👍

@allcontributors
Copy link
Contributor

@sergioburdisso

I've put up a pull request to add @enthussb! 🎉

@saurabhsbora
Copy link
Author

@sergioburdisso I updated the package and ran the code. Everything is working fine, although my accuracy has been reduced quite a bit. I guess it might be due to the latest n-gram and tokenization changes. Could you please have a look at that?

@sergioburdisso
Copy link
Owner

sergioburdisso commented May 8, 2020

I was about to tell you to perform a hyperparameter optimization using the Evaluation.grid_search() function but then I realized that I didn't include a "prep" argument to disable the default preprocessing. As a consequence, users won't be able to perform any hyperparameter optimization using only their custom preprocessing method. I'll work on that and add the "prep" parameter to the grid_search(), test, and kfold_cross_validation functions of the Evaluation class. I'm sorry for forgetting to add this in the first place 😢. I'll notify you as soon as the new version is released.

@saurabhsbora
Copy link
Author

Okay no problem 👍, till then I can work on the previous version where I had achieved great accuracy!

@sergioburdisso
Copy link
Owner

sergioburdisso commented May 8, 2020

I've just finished making those changes and released the new version (0.5.9). I've also updated the notebook adding a section for "Hyperparameter Optimization". Try performing hyperparameter optimization similar to what I did in that notebook and let me know. In case you are still getting bad accuracy, please share some more details, like part of the actual code, the actual accuracy before and after the changes, etc. It would be much easier to try to help that way. I hope you achieve great accuracy again 😢 🤞 🍀

@saurabhsbora
Copy link
Author

Sure I will check the updated package and revert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants