Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote get_top_k_items() to improve runtime #1748

Merged
merged 1 commit into from
Jun 16, 2022

Conversation

ChuyangKe
Copy link
Contributor

Description

Rewrote get_top_k_items().

Optimize the runtime of ranking metrics.

Related Issues

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging branch and not to main branch.

@gramhagen
Copy link
Collaborator

Do you have any timing results on this change?

@ChuyangKe
Copy link
Contributor Author

Do you have any timing results on this change?

I tested the old vs new function on the movielens dataset with 1m samples. I set k = 10.

Here are the results:
original_time: 5.363674836989958 seconds.
new_time: 0.12664664999465458 seconds.

@simonzhaoms
Copy link
Collaborator

@ChuyangKe Could you please also add a test case for get_top_k_items() in tests/unit/recommenders/evaluation/test_python_evaluation.py? I find there is no test for get_top_k_items(). I think you can use the data from rating_true for the test.

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good progress @ChuyangKe. I agree with Simon, it would be great to have the tests. This can be done in this PR or in a different one, depending on what you prefer. Please let Simon and me know.

In another PR you maybe can explore how to perform the same operation with numpy, hopefully this gives us a much better time.
Some code I found in the web:

@ChuyangKe
Copy link
Contributor Author

This is good progress @ChuyangKe. I agree with Simon, it would be great to have the tests. This can be done in this PR or in a different one, depending on what you prefer. Please let Simon and me know.

In another PR you maybe can explore how to perform the same operation with numpy, hopefully this gives us a much better time. Some code I found in the web:

I will create another PR for the test case @simonzhaoms @miguelgfierro.

Just wanted to make sure @miguelgfierro, do you mean implementing the same function get_top_k_items() using numpy (instead of current pandas operations), or is it another function?

@miguelgfierro
Copy link
Collaborator

do you mean implementing the same function get_top_k_items() using numpy (instead of current pandas operations), or is it another function?

get_top_k_items() using numpy

I'll merge this, super good @ChuyangKe! keep up the good work

@miguelgfierro miguelgfierro merged commit c4cd81f into staging Jun 16, 2022
@miguelgfierro miguelgfierro deleted the chuyang/ranking-metrics branch June 16, 2022 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants