Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Note added to annoytutorial.ipynb #1137

Merged
merged 5 commits into from
Feb 16, 2017
Merged

Note added to annoytutorial.ipynb #1137

merged 5 commits into from
Feb 16, 2017

Conversation

greninja
Copy link
Contributor

@greninja greninja commented Feb 5, 2017

Note explaining why gensim's 'most_similar' method uses multicore whereas annoy's 'most_similar' runs on a single core.

@piskvorky
Copy link
Owner

The description seems incorrect; the parallelism is nothing to do with GIL or Python, it's on the level of BLAS.

Also, typos (space after full stop, space before brackets, GLobal).

@greninja
Copy link
Contributor Author

greninja commented Feb 6, 2017

Okay. Another possible explanation is :

If numpy on your machine is using one of the BLAS libraries like ATLAS or LAPACK, it ll run on multiple cores if the machine has multicore support. And clearly gensim's most_similar method is using numpy's dot operation.

Does this description sound right? I ll make changes accordingly. Also will correct the typos.

@tmylk
Copy link
Contributor

tmylk commented Feb 6, 2017

That's correct. Please change the PR

@tmylk
Copy link
Contributor

tmylk commented Feb 6, 2017

Hi, unfortunately using Gensim doesn't guarantee multiple cores. Will I be possible to make it clear?

@greninja
Copy link
Contributor Author

greninja commented Feb 6, 2017

Should I just remove the initial note written in bold?

@tmylk tmylk merged commit 3e3e6dc into piskvorky:develop Feb 16, 2017
@@ -179,7 +179,7 @@
"\n",
">**Note**: Initialization time for the annoy indexer was not included in the times. The optimal knn algorithm for you to use will depend on how many queries you need to make and the size of the corpus. If you are making very few similarity queries, the time taken to initialize the annoy indexer will be longer than the time it would take the brute force method to retrieve results. If you are making many queries however, the time it takes to initialize the annoy indexer will be made up for by the incredibly fast retrieval times for queries once the indexer has been initialized\n",
"\n",
">**Note** : **If you are using gensim, it'll run on multiple cores**. Gensim's 'most_similar' method is using numpy operations in the form of dot product whereas Annoy's method isnt. If 'numpy' on your machine is using one of the BLAS libraries like ATLAS or LAPACK, it'll run on multiple cores(only if your machine has multicore support ). "
">**Note** : Gensim's 'most_similar' method is using numpy operations in the form of dot product whereas Annoy's method isnt. If 'numpy' on your machine is using one of the BLAS libraries like ATLAS or LAPACK, it'll run on multiple cores(only if your machine has multicore support ). "
Copy link
Owner

@piskvorky piskvorky Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt => isn't

LAPACK is not BLAS.

cores(only => cores (only

support ). => support).

@tmylk , did you review before merging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that there is a comma missing before "or LAPACK", CC @greninja

Copy link
Owner

@piskvorky piskvorky Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What comma? LAPACK is not a BLAS library, neither software uses LAPACK.

Maybe you meant OpenBlas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants