Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify SARPlus.recommend_k_items #1644

Merged
merged 6 commits into from
Feb 15, 2022
Merged

Conversation

simonzhaoms
Copy link
Collaborator

Description

This PR unifies SARPlus.recommend_k_items() and SARPlus.recommend_k_items_slow suggested by @angusrtaylor in the comment

Related Issues

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging branch and not to main branch.

@simonzhaoms simonzhaoms changed the title Simonz/unify recommend k items Unify SARPlus.recommend_k_items Feb 14, 2022
Copy link
Collaborator

@angusrtaylor angusrtaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonzhaoms this looks great!

@simonzhaoms simonzhaoms merged commit d7a2d56 into staging Feb 15, 2022
@simonzhaoms simonzhaoms deleted the simonz/unify-recommend-k-items branch February 15, 2022 11:18
Comment on lines +449 to +453
test: test Spark dataframe
top_k: top n items to return
remove_seen: remove items test users have already seen in the past from the recommended set
use_cache: use specified local directory stored in self.cache_path as cache for C++ based fast predictions
n_user_prediction_partitions: prediction partitions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test: test Spark dataframe
top_k: top n items to return
remove_seen: remove items test users have already seen in the past from the recommended set
use_cache: use specified local directory stored in self.cache_path as cache for C++ based fast predictions
n_user_prediction_partitions: prediction partitions
test (pySpark.DataFrame): test Spark dataframe.
top_k (int): top n items to return.
remove_seen (bool): remove items test users have already seen in the past from the recommended set.
use_cache (bool): use specified local directory stored in `self.cache_path` as cache for C++ based fast predictions.
n_user_prediction_partitions (int): prediction partitions.
Returns:
pySpark.DataFrame: Spark dataframe with recommended items.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the suggestion (@simonzhaoms please let me know if you can commit FYI @angusrtaylor).

One question, how much fast is the C++ version vs the Spark version. Maybe it would be good to explain what is happening a little bit. For example, the fast version caches to disk a intermediate computation using C++ and is around X% faster than the standard Spark version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miguelgfierro I'll make another PR to patch your suggestion

@simonzhaoms simonzhaoms mentioned this pull request Feb 17, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants