Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Spark item to item recommenders #1809

Merged
merged 8 commits into from
Sep 20, 2022

Conversation

ChuyangKe
Copy link
Contributor

Description

Implemented get_topk_most_similar_users() and get_popularity_based_topk() for SAR+.

Related Issues

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging branch and not to main branch.

@@ -488,3 +524,72 @@ def recommend_k_items(
)
else:
raise ValueError("No cache_path specified")

def get_topk_most_similar_users(self, test, user, top_k=10):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, have you checked that the results between the CPU version and this one are the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are using the same test cases as for the CPU version.

Comment on lines +191 to +195
# compute item frequencies
self.item_frequencies = item_cooccurrence.filter(
F.col("i1") == F.col("i2")
).select(F.col("i1").alias("item_id"), F.col("value").alias("frequency"))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question to @simonzhaoms will the item-item function have any relationship with the work you are doing about the new similarities?

contrib/sarplus/python/pysarplus/SARPlus.py Show resolved Hide resolved
if not items:
raise ValueError("Not implemented")

return self.item_frequencies.orderBy("frequency", ascending=False).limit(top_k)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the sarplus tests are triggered, but there is a badge in the main page that is failing: https://github.com/microsoft/recommenders/tree/staging/contrib/sarplus @simonzhaoms do you know what is the problem?

@miguelgfierro miguelgfierro merged commit 2409dc2 into staging Sep 20, 2022
@miguelgfierro miguelgfierro deleted the chuyang/spark_item2item branch September 20, 2022 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants