Ngram counts are sometimes wrong #476

firemuzzy · 2018-05-08T17:12:41Z

I have noticed that for some phrases the ngram counts are greater than the number of occurrences in the text.

Here is a an example https://runkit.com/firemuzzy/5af1d67875914d001263570c

If you look at the phrase free for, the provided text has 2 occurrences, but nlp-compromise returns a count of 3. Coincidentally the phrase free occurs 3 times, I wonder if something is messing up in the normalization.

At the same time the phrase just not user friendly only occurs 1 time, but nlp-compromise reports it with a count of 2. That discrepancy completely puzzles me.

Am I missing something with how ngrams works?

The text was updated successfully, but these errors were encountered:

spencermountain · 2018-05-09T14:16:48Z

hey Michael, nice find.
I'll take a look at fixing this this week.
thanks

spencermountain · 2018-05-11T16:30:41Z

ha! oh wow, it's the contraction - It's free for.
it's creating the gram [is] free for.
this is a great bug, will fix it today.
cheers

spencermountain · 2018-05-11T17:35:54Z

fixed in 11.8.0
thanks!

spencermountain added bug next-release labels May 9, 2018

spencermountain added a commit that referenced this issue May 11, 2018

fix for #476

e33a17e

spencermountain mentioned this issue May 11, 2018

Dev #477

Merged

spencermountain closed this as completed May 11, 2018

giorgio79 mentioned this issue Sep 16, 2018

How to pos tag and count ngrams with the new version of compromise? #519

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ngram counts are sometimes wrong #476

Ngram counts are sometimes wrong #476

firemuzzy commented May 8, 2018

spencermountain commented May 9, 2018

spencermountain commented May 11, 2018

spencermountain commented May 11, 2018

Ngram counts are sometimes wrong #476

Ngram counts are sometimes wrong #476

Comments

firemuzzy commented May 8, 2018

spencermountain commented May 9, 2018

spencermountain commented May 11, 2018

spencermountain commented May 11, 2018