Skip to content

Why do test set transformations converge to outliers/circles as num_neighbours increases? #581

Answered by lmcinnes
dpattiso asked this question in Q&A
Discussion options

You must be logged in to vote

In the random case this comes down to the curse of dimensionality. Oddly enough a spherical gaussian in high dimensional space actually has almost all the data in a spherical shell, not in the middle of the ball. Uniform distributions end up with points "in the corners" in high dimensions.

UMAP does try to correct for these factors, but when doing a train/test split and training on one set of data it learns distributions only from that training data. The new test data gets transformed assuming that learned distribution of data, and so it generally ends up being on the "outside" because, in practice, that's where most data is. This gets rendered in low dimensions by having the new data tra…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@dpattiso
Comment options

Answer selected by dpattiso
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants