Added Noun phrase count #13

ritikjain51 · 2020-10-06T05:19:17Z

Added Noun phrase counting.
Solved the counting issue #14.
Logic Updated with emoji decoding for robust noun phrase.

neomatrix369

In general LGTM
Very good and simple implementation and we can build more on top of this I think

Thank you for your first issue, please review my comments and response.

Have you also run the test coverage shell script, can you please post the screenshot of the results, so we can see if there is 100% coverage.

Soon I will have a CI/CD in place for Pull Requests but for now this would help.

nlp_profiler/granular_features.py

nlp_profiler/noun_phase_count.py

neomatrix369 · 2020-10-06T11:54:53Z

@ritikjain51 there are also acceptance tests and notebooks to update with your new change - the notebooks are under the notebook folder. Also, look for tests under the slow-tests folder. This is how you run the test coverage shell script:

cd [root of the project folder]
./test-coverage.sh "tests slow-tests"

This will generate tests run report and test coverage report.

neomatrix369 · 2020-10-06T12:20:18Z

@ritikjain51 please also add a bit of description explaining the background what you are doing, what is the outcome and what are you using to get the results, this will help me and others reading this PR.

As you can see from #15, you have opened up a potential, something I knew about but didn't give it a priority.

requirements.txt

neomatrix369 · 2020-10-06T15:17:15Z

@ritikjain51 could you also please add a description to the PR as this will help me and others who look at it later.

You may have also missed out re-running the other notebook (nlp_profiler.ipynb), please take a look. You can skip the nlp_profiler-large-dataset.ipynb for now.

Thnx

tests/granular/test_nounphase.py

neomatrix369 · 2020-10-07T14:39:20Z

@ritikjain51 Thanks for your patience and following through with the code review.

I will write-up the Developer guide soon so it can be a standard way to do things. There are also a couple of things to be automated which will also help.

nlp_profiler/noun_phase_count.py

tests/granular/test_nounphase.py

neomatrix369 · 2020-10-08T13:28:39Z

@ritikjain51 your PR is out of sync with master, you may need to rebase with master

Are you using nbdime or another Jupyter/notebook version control plugin, it can help understand the differences?

ritikjain51 · 2020-10-08T13:43:56Z

There are 3 Cases are failing because I don't have datasets to validate those.
Those 3 Cases are being used in test_apply_text_profiling.py

neomatrix369 · 2020-10-08T13:48:59Z

...

There are 3 Cases are failing because I don't have datasets to validate those.
Those 3 Cases are being used in test_apply_text_profiling.py

The dataset is generated in the test itself, have a read of the tests that are failing, there is a method that creates the dataset.

These failures are normal if you look closely the Acceptance test is trying to compare old results with your new results (has one extra column) so it will fail. I suggest you generate the expected datasets and save them in the https://github.com/neomatrix369/nlp_profiler/tree/master/tests/acceptance_tests/data.

It's easy to generate the new test data, but you do it only when you know the new dataset is correct in every form. For each of the test cases in https://github.com/neomatrix369/nlp_profiler/blob/master/tests/acceptance_tests/test_apply_text_profiling.py save the results generated by the actual_dataframe using this line:

        def test_case() # for all the relevant test cases
                     ...
                     actual_dataframe.to_csv(csv_filename, index=False)  ### remove this line when you have finished generating the csv file
                     ...

Compare the old results with new using git diff and you can see the changes, in your case only your new column should change not any other aspect of the .csv files.

Let me know if you don't follow this step.

I hope to make this easier in future releases. So we can generate our tests data easily. But generating test data is always a manual task.

ritikjain51 · 2020-10-08T13:55:41Z

There are 3 Cases are failing because I don't have datasets to validate those.
Those 3 Cases are being used in test_apply_text_profiling.py

The dataset is generated in the test itself, have a read of the tests that are failing, there is a method that creates the dataset.

In the test code, We are fetching data from expected data path location. Which is on available with me

neomatrix369 · 2020-10-08T14:00:50Z

@ritikjain51

Not true, the failing tests in https://github.com/neomatrix369/nlp_profiler/blob/master/tests/acceptance_tests/test_apply_text_profiling.py are creating their own dataset and also using the data in the folder https://github.com/neomatrix369/nlp_profiler/tree/master/tests/acceptance_tests/data. Please look again.

There is no external data used here.

ritikjain51 · 2020-10-08T14:03:26Z

@ritikjain51

Not true, the failing tests in https://github.com/neomatrix369/nlp_profiler/blob/master/tests/acceptance_tests/test_apply_text_profiling.py are creating their own dataset and also using the data in the folder https://github.com/neomatrix369/nlp_profiler/tree/master/tests/acceptance_tests/data. Please look again.

There is no external data used here.

ritikjain51

Merge Conflict Solved on local.

ritikjain51 · 2020-10-18T06:58:46Z

These Issues are there because u have made structural changes in your code.

… sentences_count() and spelling quality check functionalities

…ed to parallelisation_method in all notebooks

ritikjain51 · 2020-10-18T10:19:39Z

Thanks @neomatrix369. for your guidance. I am closing this Pull Request. It's getting difficult to understand the issues.
Hope you understand.
Regards,
Ritik Jain

neomatrix369 · 2020-10-18T11:40:48Z

@ritikjain51 the master branch did move on Friday as I was refactoring the code (implementation) - sorry about that but it had to be done for better readability and maintenance.

The merge conflicts were mainly due to changes in the data and notebook files.

It is normal in a project for code to change in branch to be out of step with master as development continues. I see you have also deleted your original changes, maybe someone could have helped you fix the changes.

PyCharm or VSCode have good git functionalities to help in the process - it's something out of scope of my project to help you with.

You can always create a new fork and then a new branch with the changes and regenerate the notebooks - it's up to you, otherwise myself or someone else might implement this functionality separately.

neomatrix369 · 2020-10-18T12:26:01Z

Your tests were failing on CI/CD due to the absence of this line:

nltk.download('averaged_perceptron_tagger')

in the noun_phase_count.py module (see 08e54c4#diff-071796e721082b14e9eb7c22c771ca452fe1f83652dc4bcce6d1c599409a6c41R10)

The CI/CD logs did indicate that.

After fixing this line the tests pass:

tests/acceptance_tests/test_apply_text_profiling.py .....                                                                [  2%]
tests/granular/test_alphanumeric.py ........                                                                             [  7%]
tests/granular/test_chars_and_spaces.py ......                                                                           [ 10%]
tests/granular/test_dates.py .........                                                                                   [ 15%]
tests/granular/test_duplicates.py .........                                                                              [ 21%]
tests/granular/test_emojis.py ........                                                                                   [ 25%]
tests/granular/test_non_alphanumeric.py ........                                                                         [ 30%]
tests/granular/test_nounphase.py .........                                                                               [ 35%]
tests/granular/test_numbers.py ........                                                                                  [ 39%]
tests/granular/test_punctuations.py ........                                                                             [ 44%]
tests/granular/test_sentences.py ..................                                                                      [ 54%]
tests/granular/test_stop_words.py ........                                                                               [ 59%]
tests/granular/test_words.py ........                                                                                    [ 63%]
tests/high_level/test_grammar_check.py ..........                                                                        [ 69%]
tests/high_level/test_sentiment_polarity.py ................                                                             [ 78%]
tests/high_level/test_sentiment_subjectivity.py ................                                                         [ 87%]
tests/high_level/test_spelling_check.py ..................                                                               [ 97%]
slow-tests/acceptance_tests/test_apply_text_profiling.py .                                                               [ 98%]
slow-tests/performance_tests/test_perf_grammar_check.py .                                                                [ 98%]
slow-tests/performance_tests/test_perf_noun_phase.py .                                                                   [ 99%]
slow-tests/performance_tests/test_perf_spelling_check.py .                                                               [100%]

I would also suggest using the PR page i.e. #13 to read and respond to messages, lots of information on this page helps with development process.

neomatrix369 · 2020-10-18T12:55:22Z

@ritikjain51
Since you put so much time and effort into this work I restored the branch for you see https://github.com/neomatrix369/nlp_profiler/tree/addNounPhraseCount - your commits are preserved, I added some fixes to it.

Also, please have a look at the CI/CD process https://github.com/neomatrix369/nlp_profiler/runs/1271145643?check_suite_focus=true

Please do a thorough check of it to see if anything has been missed out. Happy for you to create a new PR as the community values your hard work - let's coordinate this time and try to stay in sync with master.

Again if you are unfamiliar with something its best to say it and ask for help - only way for us to come forward and step in.

Added Noun phrase count

544de04

neomatrix369 self-requested a review October 6, 2020 11:38

neomatrix369 added the enhancement New feature or request label Oct 6, 2020

neomatrix369 added this to Doing in NLP Profiler via automation Oct 6, 2020

neomatrix369 requested changes Oct 6, 2020

View reviewed changes

nlp_profiler/granular_features.py Outdated Show resolved Hide resolved

nlp_profiler/noun_phase_count.py Outdated Show resolved Hide resolved

nlp_profiler/noun_phase_count.py Outdated Show resolved Hide resolved

nlp_profiler/noun_phase_count.py Outdated Show resolved Hide resolved

neomatrix369 added hacktoberfest-accepted Approved/merged. Part of the Hacktoberfest 2020 (https://hacktoberfest.digitalocean.com) granular feature(s) Low-level/granular feature(s) labels Oct 6, 2020

neomatrix369 mentioned this pull request Oct 6, 2020

Add phrase counts or parts-of-speech token counts after extracting entities from a sentence #15

Open

36 tasks

neomatrix369 mentioned this pull request Oct 6, 2020

Sentences (in general) are getting an incorrect sentence_count value #14

Closed

4 tasks

Noun Phase added in notebook, function renamed and requirements updated

0d30f88

neomatrix369 added hacktoberfest Classify topic. Part of the Hacktoberfest 2020 (https://hacktoberfest.digitalocean.com) and removed hacktoberfest-accepted Approved/merged. Part of the Hacktoberfest 2020 (https://hacktoberfest.digitalocean.com) labels Oct 6, 2020

neomatrix369 reviewed Oct 6, 2020

View reviewed changes

requirements.txt Show resolved Hide resolved

neomatrix369 assigned ritikjain51 Oct 6, 2020

neomatrix369 reviewed Oct 7, 2020

View reviewed changes

tests/granular/test_nounphase.py Outdated Show resolved Hide resolved

TestCases filteration and emoji decoder added

803fb61

neomatrix369 reviewed Oct 7, 2020

View reviewed changes

nlp_profiler/noun_phase_count.py Outdated Show resolved Hide resolved

neomatrix369 reviewed Oct 7, 2020

View reviewed changes

tests/granular/test_nounphase.py Show resolved Hide resolved

Performance Test Added

3bb6dc9

Notebooks Added

9fb6032

Performance Test Issues

cd00ea1

ritikjain51 commented Oct 18, 2020

View reviewed changes

Ritik Jain and others added 19 commits October 18, 2020 13:19

Project Structure Changed

70cda3d

Added Noun phrase count

85d500b

Noun Phase added in notebook, function renamed and requirements updated

e4a4302

TestCases filteration and emoji decoder added

fb4edf3

Performance Test Added

5a652e3

Notebooks Added

ae75fda

Data Updated

f119011

Noteboook Updated

6e76493

Notebooks Merge Conflict Solved

a91e034

Code Updated for Python 3.6 Support

0d01921

Notebooks: updated the Jupyter and Google Colab versions after fixing…

edb1c47

… sentences_count() and spelling quality check functionalities

Notebooks: removed the old references to parallelism_method and chang…

393b09b

…ed to parallelisation_method in all notebooks

Added

026c1e1

Test Cases Updated

97f8793

Test Cases Updated

f289954

Performance Test Issues

78faae0

Project Structure Changed

98b565b

All Rebased

cb3a427

Updatres

ef4f47b

ritikjain51 closed this Oct 18, 2020

ritikjain51 deleted the addNounPhrase branch October 18, 2020 10:22

neomatrix369 mentioned this pull request Oct 19, 2020

Add noun phrase count to the granular features functionality #47

Merged

8 tasks

neomatrix369 moved this from Doing to Done in NLP Profiler Oct 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Noun phrase count #13

Added Noun phrase count #13

ritikjain51 commented Oct 6, 2020 •

edited

Loading

neomatrix369 left a comment

neomatrix369 commented Oct 6, 2020 •

edited

Loading

neomatrix369 commented Oct 6, 2020

neomatrix369 commented Oct 6, 2020

neomatrix369 commented Oct 7, 2020

neomatrix369 commented Oct 8, 2020 •

edited

Loading

ritikjain51 commented Oct 8, 2020

neomatrix369 commented Oct 8, 2020 •

edited

Loading

ritikjain51 commented Oct 8, 2020

neomatrix369 commented Oct 8, 2020 •

edited

Loading

ritikjain51 commented Oct 8, 2020

ritikjain51 left a comment

ritikjain51 commented Oct 18, 2020

ritikjain51 commented Oct 18, 2020

neomatrix369 commented Oct 18, 2020 •

edited

Loading

neomatrix369 commented Oct 18, 2020 •

edited

Loading

neomatrix369 commented Oct 18, 2020

Added Noun phrase count #13

Added Noun phrase count #13

Conversation

ritikjain51 commented Oct 6, 2020 • edited Loading

neomatrix369 left a comment

Choose a reason for hiding this comment

neomatrix369 commented Oct 6, 2020 • edited Loading

neomatrix369 commented Oct 6, 2020

neomatrix369 commented Oct 6, 2020

neomatrix369 commented Oct 7, 2020

neomatrix369 commented Oct 8, 2020 • edited Loading

ritikjain51 commented Oct 8, 2020

neomatrix369 commented Oct 8, 2020 • edited Loading

ritikjain51 commented Oct 8, 2020

neomatrix369 commented Oct 8, 2020 • edited Loading

ritikjain51 commented Oct 8, 2020

ritikjain51 left a comment

Choose a reason for hiding this comment

ritikjain51 commented Oct 18, 2020

ritikjain51 commented Oct 18, 2020

neomatrix369 commented Oct 18, 2020 • edited Loading

neomatrix369 commented Oct 18, 2020 • edited Loading

neomatrix369 commented Oct 18, 2020

ritikjain51 commented Oct 6, 2020 •

edited

Loading

neomatrix369 commented Oct 6, 2020 •

edited

Loading

neomatrix369 commented Oct 8, 2020 •

edited

Loading

neomatrix369 commented Oct 8, 2020 •

edited

Loading

neomatrix369 commented Oct 8, 2020 •

edited

Loading

neomatrix369 commented Oct 18, 2020 •

edited

Loading

neomatrix369 commented Oct 18, 2020 •

edited

Loading