Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: why is multi-hot encoding appropriate for sequences? #211

Open
mbutterick opened this issue Aug 25, 2022 · 0 comments
Open

question: why is multi-hot encoding appropriate for sequences? #211

mbutterick opened this issue Aug 25, 2022 · 0 comments

Comments

@mbutterick
Copy link

In Deep Learning with Python 2e, @fchollet says:

Multi-hot encode your lists to turn them into vectors of 0s and 1s. This would mean, for instance, turning the sequence [8, 5] into a 10,000-dimensional vector that would be all 0s except for indices 8 and 5, which would be 1s.

Wouldn’t this encoding essentially reduce the sequence to a set, which means losing information? For instance, how would the encoded representation of [8, 5] differ from [5, 8] or [8, 5, 8] (or any longer sequence of 8 and 5 elements)?

(This example arises in a tutorial about classifying movie reviews by whether they’re positive or negative. To make the example concrete, if word 5 is pretty and word 8 is awful, then there’s going to be a classification difference between pretty awful and awful pretty!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant