Custom preprocessing in Live Test #3

saurabhsbora · 2020-05-04T05:14:35Z

@sergioburdisso
It would be a great feature to have custom preprocessing in the Live Test.
This will enable us to visually understand the words, sentences, and paragraphs that helped the model to classify a particular document after custom preprocessing.

The method used to recognize/parse learned word n-grams was improved. The new code it is easier to follow and clearer, and semantically more accurate. Furthermore, single words and word n-grams tokanization and visualization in the Live Test Tool was also improved by this new code. Before this modification, there was a bug in the Live Test tool raised by the addition of the custom preprocessing support (#3). Some words were visualized in the wrong way, especially those ended with question marks, for instance the sentence "Great atmosphere then ?????". In addition, now the user can put an arbitrarily number of spaces between words and they will be ignored when recognizing word n-grams. For instance, before this modification, if the user entered: "machine learning" It was not recognized as a bigram due to those extra spaces.

Added: - The Live Test Tool now supports custom (user-defined) preprosessing methods (b50cfaf, resolved #3). - The tokenization process was improved (26fff88, 4af8e80). - The process for recognizing word n-grams during classification was improved (2ceb148).

sergioburdisso · 2020-05-05T10:33:57Z

Hi @enthussb!
Sure, I overlooked this option when first coding the Live Test tool. Thank you for your suggestion :)

I've added this feature in the new version, and also took the opportunity to incorporate some other things that were pending, namely, what's new on this version is:

The Live Test Tool now supports custom (user-defined) preprocessing methods (b50cfaf, resolved Custom preprocessing in Live Test #3).
The tokenization process was improved (26fff88, 4af8e80).
The process for recognizing word n-grams during classification was improved (2ceb148).

Update your package version using the pip install -U pyss3 to the new version (0.5.8). To make things easier for you, I've created a new Jupyter Notebook in the examples folder in which is shown how to work incorporate user-defined preprocessing functions to the Live Test tool visualizations for you to follow if you want: using_custom_preprocessing.ipynb

Let me know if everything worked OK ☕

sergioburdisso · 2020-05-05T10:45:02Z

@all-contributors would you add @enthussb for ideas to the README file? it helped to make this project better by suggesting this cool feature 👍

allcontributors · 2020-05-05T10:45:12Z

@sergioburdisso

I've put up a pull request to add @enthussb! 🎉

saurabhsbora · 2020-05-07T16:19:43Z

@sergioburdisso I updated the package and ran the code. Everything is working fine, although my accuracy has been reduced quite a bit. I guess it might be due to the latest n-gram and tokenization changes. Could you please have a look at that?

sergioburdisso · 2020-05-08T13:06:37Z

I was about to tell you to perform a hyperparameter optimization using the Evaluation.grid_search() function but then I realized that I didn't include a "prep" argument to disable the default preprocessing. As a consequence, users won't be able to perform any hyperparameter optimization using only their custom preprocessing method. I'll work on that and add the "prep" parameter to the grid_search(), test, and kfold_cross_validation functions of the Evaluation class. I'm sorry for forgetting to add this in the first place 😢. I'll notify you as soon as the new version is released.

saurabhsbora · 2020-05-08T14:32:09Z

Okay no problem 👍, till then I can work on the previous version where I had achieved great accuracy!

sergioburdisso · 2020-05-08T15:14:27Z

I've just finished making those changes and released the new version (0.5.9). I've also updated the notebook adding a section for "Hyperparameter Optimization". Try performing hyperparameter optimization similar to what I did in that notebook and let me know. In case you are still getting bad accuracy, please share some more details, like part of the actual code, the actual accuracy before and after the changes, etc. It would be much easier to try to help that way. I hope you achieve great accuracy again 😢 🤞 🍀

saurabhsbora · 2020-05-09T04:06:49Z

Sure I will check the updated package and revert.

sergioburdisso added the enhancement New feature or request label May 4, 2020

sergioburdisso closed this as completed in b50cfaf May 5, 2020

allcontributors bot mentioned this issue May 5, 2020

docs: add enthussb as a contributor #4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom preprocessing in Live Test #3

Custom preprocessing in Live Test #3

saurabhsbora commented May 4, 2020 •

edited

Loading

sergioburdisso commented May 5, 2020 •

edited

Loading

sergioburdisso commented May 5, 2020

allcontributors bot commented May 5, 2020

saurabhsbora commented May 7, 2020

sergioburdisso commented May 8, 2020 •

edited

Loading

saurabhsbora commented May 8, 2020

sergioburdisso commented May 8, 2020 •

edited

Loading

saurabhsbora commented May 9, 2020

Custom preprocessing in Live Test #3

Custom preprocessing in Live Test #3

Comments

saurabhsbora commented May 4, 2020 • edited Loading

sergioburdisso commented May 5, 2020 • edited Loading

sergioburdisso commented May 5, 2020

allcontributors bot commented May 5, 2020

saurabhsbora commented May 7, 2020

sergioburdisso commented May 8, 2020 • edited Loading

saurabhsbora commented May 8, 2020

sergioburdisso commented May 8, 2020 • edited Loading

saurabhsbora commented May 9, 2020

saurabhsbora commented May 4, 2020 •

edited

Loading

sergioburdisso commented May 5, 2020 •

edited

Loading

sergioburdisso commented May 8, 2020 •

edited

Loading

sergioburdisso commented May 8, 2020 •

edited

Loading