-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify SARPlus.recommend_k_items #1644
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonzhaoms this looks great!
test: test Spark dataframe | ||
top_k: top n items to return | ||
remove_seen: remove items test users have already seen in the past from the recommended set | ||
use_cache: use specified local directory stored in self.cache_path as cache for C++ based fast predictions | ||
n_user_prediction_partitions: prediction partitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test: test Spark dataframe | |
top_k: top n items to return | |
remove_seen: remove items test users have already seen in the past from the recommended set | |
use_cache: use specified local directory stored in self.cache_path as cache for C++ based fast predictions | |
n_user_prediction_partitions: prediction partitions | |
test (pySpark.DataFrame): test Spark dataframe. | |
top_k (int): top n items to return. | |
remove_seen (bool): remove items test users have already seen in the past from the recommended set. | |
use_cache (bool): use specified local directory stored in `self.cache_path` as cache for C++ based fast predictions. | |
n_user_prediction_partitions (int): prediction partitions. | |
Returns: | |
pySpark.DataFrame: Spark dataframe with recommended items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the suggestion (@simonzhaoms please let me know if you can commit FYI @angusrtaylor).
One question, how much fast is the C++ version vs the Spark version. Maybe it would be good to explain what is happening a little bit. For example, the fast version caches to disk a intermediate computation using C++ and is around X% faster than the standard Spark version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miguelgfierro I'll make another PR to patch your suggestion
Description
This PR unifies SARPlus.recommend_k_items() and SARPlus.recommend_k_items_slow suggested by @angusrtaylor in the comment
Related Issues
Checklist:
staging branch
and not tomain branch
.