Unify SARPlus.recommend_k_items #1644

simonzhaoms · 2022-02-14T08:32:32Z

Description

This PR unifies SARPlus.recommend_k_items() and SARPlus.recommend_k_items_slow suggested by @angusrtaylor in the comment

Related Issues

[FEATURE] Unify SAR+ recommend_k_items() and recommend_k_itmes_slow() #1642

Checklist:

I have followed the contribution guidelines and code style for this project.
I have added tests covering my contributions.
I have updated the documentation accordingly.
This PR is being made to staging branch and not to main branch.

contrib/sarplus/python/pysarplus/SARPlus.py

angusrtaylor

@simonzhaoms this looks great!

miguelgfierro · 2022-02-15T12:19:04Z

contrib/sarplus/python/pysarplus/SARPlus.py

+            test: test Spark dataframe
+            top_k: top n items to return
+            remove_seen: remove items test users have already seen in the past from the recommended set
+            use_cache: use specified local directory stored in self.cache_path as cache for C++ based fast predictions
+            n_user_prediction_partitions: prediction partitions


Suggested change

test: test Spark dataframe

top_k: top n items to return

remove_seen: remove items test users have already seen in the past from the recommended set

use_cache: use specified local directory stored in self.cache_path as cache for C++ based fast predictions

n_user_prediction_partitions: prediction partitions

test (pySpark.DataFrame): test Spark dataframe.

top_k (int): top n items to return.

remove_seen (bool): remove items test users have already seen in the past from the recommended set.

use_cache (bool): use specified local directory stored in `self.cache_path` as cache for C++ based fast predictions.

n_user_prediction_partitions (int): prediction partitions.

Returns:

pySpark.DataFrame: Spark dataframe with recommended items.

Apart from the suggestion (@simonzhaoms please let me know if you can commit FYI @angusrtaylor).

One question, how much fast is the C++ version vs the Spark version. Maybe it would be good to explain what is happening a little bit. For example, the fast version caches to disk a intermediate computation using C++ and is around X% faster than the standard Spark version

@miguelgfierro I'll make another PR to patch your suggestion

simonzhaoms added 3 commits February 14, 2022 14:33

Rename str into string

06f0e48

Rename f() into _format()

13975cc

Combine fast and slow recommend_k_itmes() into one

49f33d8

simonzhaoms requested review from miguelgfierro, gramhagen, anargyri, loomlike and wutaomsft as code owners February 14, 2022 08:32

Bump version 0.6.0

ef6a743

simonzhaoms changed the title ~~Simonz/unify recommend k items~~ Unify SARPlus.recommend_k_items Feb 14, 2022

angusrtaylor reviewed Feb 14, 2022

View reviewed changes

contrib/sarplus/python/pysarplus/SARPlus.py Outdated Show resolved Hide resolved

angusrtaylor approved these changes Feb 14, 2022

View reviewed changes

simonzhaoms added 2 commits February 14, 2022 18:26

Make the logic clearer

b62aa4b

Merge branch 'staging' into simonz/unify-recommend-k-items

0d8b353

simonzhaoms merged commit d7a2d56 into staging Feb 15, 2022

simonzhaoms deleted the simonz/unify-recommend-k-items branch February 15, 2022 11:18

miguelgfierro reviewed Feb 15, 2022

View reviewed changes

simonzhaoms mentioned this pull request Feb 17, 2022

SAR+ docstring patch #1648

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify SARPlus.recommend_k_items #1644

Unify SARPlus.recommend_k_items #1644

simonzhaoms commented Feb 14, 2022

angusrtaylor left a comment

miguelgfierro Feb 15, 2022

miguelgfierro Feb 15, 2022

simonzhaoms Feb 16, 2022

-            test: test Spark dataframe
-            top_k: top n items to return
-            remove_seen: remove items test users have already seen in the past from the recommended set
-            use_cache: use specified local directory stored in self.cache_path as cache for C++ based fast predictions
-            n_user_prediction_partitions: prediction partitions
+            test (pySpark.DataFrame): test Spark dataframe.
+            top_k (int): top n items to return.
+            remove_seen (bool): remove items test users have already seen in the past from the recommended set.
+            use_cache (bool): use specified local directory stored in `self.cache_path` as cache for C++ based fast predictions.
+            n_user_prediction_partitions (int): prediction partitions.
+        Returns:
+            pySpark.DataFrame: Spark dataframe with recommended items.

Unify SARPlus.recommend_k_items #1644

Unify SARPlus.recommend_k_items #1644

Conversation

simonzhaoms commented Feb 14, 2022

Description

Related Issues

Checklist:

angusrtaylor left a comment

Choose a reason for hiding this comment

miguelgfierro Feb 15, 2022

Choose a reason for hiding this comment

miguelgfierro Feb 15, 2022

Choose a reason for hiding this comment

simonzhaoms Feb 16, 2022

Choose a reason for hiding this comment