Rewrote get_top_k_items() to improve runtime #1748

ChuyangKe · 2022-06-15T21:30:47Z

Description

Rewrote get_top_k_items().

Optimize the runtime of ranking metrics.

Related Issues

Checklist:

I have followed the contribution guidelines and code style for this project.
I have added tests covering my contributions.
I have updated the documentation accordingly.
This PR is being made to staging branch and not to main branch.

gramhagen · 2022-06-15T21:35:07Z

Do you have any timing results on this change?

ChuyangKe · 2022-06-15T21:38:24Z

Do you have any timing results on this change?

I tested the old vs new function on the movielens dataset with 1m samples. I set k = 10.

Here are the results:
original_time: 5.363674836989958 seconds.
new_time: 0.12664664999465458 seconds.

simonzhaoms · 2022-06-16T07:00:20Z

@ChuyangKe Could you please also add a test case for get_top_k_items() in tests/unit/recommenders/evaluation/test_python_evaluation.py? I find there is no test for get_top_k_items(). I think you can use the data from rating_true for the test.

miguelgfierro

This is good progress @ChuyangKe. I agree with Simon, it would be great to have the tests. This can be done in this PR or in a different one, depending on what you prefer. Please let Simon and me know.

In another PR you maybe can explore how to perform the same operation with numpy, hopefully this gives us a much better time.
Some code I found in the web:

with argpartition: https://stackoverflow.com/questions/43386432/how-to-get-indexes-of-k-maximum-values-from-a-numpy-multidimensional-array
groupby with numpy (maybe the other method is better): https://cmdlinetips.com/2019/05/how-to-implement-pandas-groupby-operation-with-numpy/

ChuyangKe · 2022-06-16T15:35:31Z

This is good progress @ChuyangKe. I agree with Simon, it would be great to have the tests. This can be done in this PR or in a different one, depending on what you prefer. Please let Simon and me know.

In another PR you maybe can explore how to perform the same operation with numpy, hopefully this gives us a much better time. Some code I found in the web:

with argpartition: https://stackoverflow.com/questions/43386432/how-to-get-indexes-of-k-maximum-values-from-a-numpy-multidimensional-array

groupby with numpy (maybe the other method is better): https://cmdlinetips.com/2019/05/how-to-implement-pandas-groupby-operation-with-numpy/

I will create another PR for the test case @simonzhaoms @miguelgfierro.

Just wanted to make sure @miguelgfierro, do you mean implementing the same function get_top_k_items() using numpy (instead of current pandas operations), or is it another function?

miguelgfierro · 2022-06-16T18:44:50Z

do you mean implementing the same function get_top_k_items() using numpy (instead of current pandas operations), or is it another function?

get_top_k_items() using numpy

I'll merge this, super good @ChuyangKe! keep up the good work

Rewrite get_top_k_items()

abed9b6

ChuyangKe requested a review from SahitiCheguru June 15, 2022 21:30

ChuyangKe requested review from miguelgfierro, gramhagen, anargyri, loomlike and wutaomsft as code owners June 15, 2022 21:30

simonzhaoms approved these changes Jun 16, 2022

View reviewed changes

simonzhaoms mentioned this pull request Jun 16, 2022

[FEATURE] Add time performance benchmark for functions in unit tests #1750

Closed

miguelgfierro approved these changes Jun 16, 2022

View reviewed changes

miguelgfierro merged commit c4cd81f into staging Jun 16, 2022

miguelgfierro deleted the chuyang/ranking-metrics branch June 16, 2022 18:45

ChuyangKe mentioned this pull request Jun 22, 2022

Added tests for ranking function get_top_k_items() #1757

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrote get_top_k_items() to improve runtime #1748

Rewrote get_top_k_items() to improve runtime #1748

ChuyangKe commented Jun 15, 2022

gramhagen commented Jun 15, 2022

ChuyangKe commented Jun 15, 2022

simonzhaoms commented Jun 16, 2022

miguelgfierro left a comment •

edited

Loading

ChuyangKe commented Jun 16, 2022

miguelgfierro commented Jun 16, 2022

Rewrote get_top_k_items() to improve runtime #1748

Rewrote get_top_k_items() to improve runtime #1748

Conversation

ChuyangKe commented Jun 15, 2022

Description

Related Issues

Checklist:

gramhagen commented Jun 15, 2022

ChuyangKe commented Jun 15, 2022

simonzhaoms commented Jun 16, 2022

miguelgfierro left a comment • edited Loading

Choose a reason for hiding this comment

ChuyangKe commented Jun 16, 2022

miguelgfierro commented Jun 16, 2022

miguelgfierro left a comment •

edited

Loading