Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entropy_bits does not always calculates entropy correctly #16

Open
HacKanCuBa opened this issue Jan 10, 2019 · 0 comments
Open

entropy_bits does not always calculates entropy correctly #16

HacKanCuBa opened this issue Jan 10, 2019 · 0 comments
Labels
bug confirmed ToDo Kanban - Issue to be done in current sprint
Milestone

Comments

@HacKanCuBa
Copy link
Owner

calc::entropy_bits() is not calculating the entropy correctly. Fortunately, it works for the current use case, but should somebody else use it to calculate the entropy of a list with repeated elements the result would be totally wrong.
Example:

>>> entropy_bits(list('abcabcabcabc'))  # repeated elements, problem
6.339850002884623  # should be 1.5849625007211559
>>> entropy_bits(list('abcdefghijkl'))  #  no element repetition, ok
3.584962500721156  # correct

The problem is not taking into consideration the number of times an element is repeated in the list. The fix is quite easy:

for prob, count in zip(probs, counts):
    entropy -= prob * log2(prob) / count
    print(entropy)

Note that len(probs) == len(counts) and are respectively ordered.

@HacKanCuBa HacKanCuBa added this to the v2 milestone Jan 10, 2019
@HacKanCuBa HacKanCuBa added the ToDo Kanban - Issue to be done in current sprint label Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug confirmed ToDo Kanban - Issue to be done in current sprint
Projects
None yet
Development

No branches or pull requests

1 participant