Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn/manifold/_t_sne.py: line 792 X.shape[0] causes AttributeError #136

Open
acxcv opened this issue Sep 27, 2022 · 2 comments
Open

sklearn/manifold/_t_sne.py: line 792 X.shape[0] causes AttributeError #136

acxcv opened this issue Sep 27, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@acxcv
Copy link

acxcv commented Sep 27, 2022

🐛 Bug

I discovered a bug occurring when running examples/countries.py due to an incompatibility with the sklearn t-SNE. It can be resolved by making a few small changes.

Current Behavior

Part 1

Executing countries.py fails with
File "/[env]/lib/python3.x/site-packages/sklearn/manifold/_t_sne.py", line 792, in _check_params_vs_input
if self.perplexity >= X.shape[0]
AttributeError: 'list' object has no attribute 'shape'

See Possible Solution for a fix

Part 2 (after having resolved Part 1)

File /[env]/lib/python3.x/site-packages/sklearn/manifold/_t_sne.py, line 793, in_check_params_vs_input
raise ValueError("perplexity must be less than n_samples")
ValueError: perplexity must be less than n_samples

This is because countries.py, line 28 calls fit_transform with an entity list of 22 objects (which t-SNE uses as n_samples).

Steps to Reproduce

  1. Install rdf2vec and its dependencies
  2. Run examples/countries.py

Environment

  • Operating system: Fedora Linux 35
  • pyRDF2Vec version: 0.2.3
  • Python version: 3.8

Possible Solution

The issue in Part 1 can be resolved by modifying TSNE._check_params_vs_input in /[env]/lib/python3.x/site-packages/sklearn/manifold/_t_sne.py.
Changing X.shape[0] to len(X) solves this particular problem and the code continues executing.

Part 2 can be resolved by setting the value for perplexity in sklearn/manifold/_t_sne.py: TSNE.__init__ to a value smaller than 22. Even 21.9 will work.

In the above example, we try to create embeddings for the 22 entities in samples/countries-cities/entities.tsv. TSNE throws an error because its perplexity value can't be higher than the number of entities.

Read this to understand the intuition behind perplexity in t-SNE.

Also, be cautious when using this modified version of t-SNE outside a dedicated environment for pyRDF2Vec as it'll likely cause problems.

@acxcv acxcv added the bug Something isn't working label Sep 27, 2022
@GillesVandewiele
Copy link
Collaborator

min(len(X), default_perplexity) might be a cleaner solution!

@acxcv
Copy link
Author

acxcv commented Sep 30, 2022

min(len(X), default_perplexity) might be a cleaner solution!

Good idea, but the value for perplexity has to be smaller than, not equal to, len(X).

The best workaround using your idea I can think of is instantiating TSNE something like this:
X_tsne = TSNE(perplexity=len(x) - 0.01 if len(x) < 30 else 30). Kind of inelegant, but it'll do the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants