Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SASRec Notebook & List of Authors #1621

Merged
merged 13 commits into from
Feb 4, 2022
Merged

Update SASRec Notebook & List of Authors #1621

merged 13 commits into from
Feb 4, 2022

Conversation

aeroabir
Copy link
Collaborator

Description

SASRec notebook uses a function data_process_with_time that can be written in terms of an existing function min_rating_filter_pandas. The notebook is updated by removing data_process_with_time and using only min_rating_filter_pandas.

Related Issues

Fixing the issue here

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging branch and not to main branch.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

examples/00_quick_start/sasrec_amazon.ipynb Outdated Show resolved Hide resolved
examples/00_quick_start/sasrec_amazon.ipynb Outdated Show resolved Hide resolved
examples/00_quick_start/sasrec_amazon.ipynb Outdated Show resolved Hide resolved
Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @aeroabir I did another pass. Maybe the problem we are having with the long tests in the nightly builds could be related to the hardcoded values of the epochs in the tests.

@miguelgfierro
Copy link
Collaborator

we are getting weird OOM errors in other parts of the code, see https://github.com/microsoft/recommenders/runs/5035292456?check_suite_focus=true

FAILED tests/unit/examples/test_notebooks_gpu.py::test_fastai - papermill.exc...
FAILED tests/unit/examples/test_notebooks_gpu.py::test_xdeepfm - papermill.ex...
FAILED tests/unit/examples/test_notebooks_gpu.py::test_dkn_quickstart - paper...

Not sure if this is related to this PR or is something we already have in staging

@anargyri
Copy link
Collaborator

anargyri commented Feb 2, 2022

we are getting weird OOM errors in other parts of the code, see https://github.com/microsoft/recommenders/runs/5035292456?check_suite_focus=true

FAILED tests/unit/examples/test_notebooks_gpu.py::test_fastai - papermill.exc...
FAILED tests/unit/examples/test_notebooks_gpu.py::test_xdeepfm - papermill.ex...
FAILED tests/unit/examples/test_notebooks_gpu.py::test_dkn_quickstart - paper...

Not sure if this is related to this PR or is something we already have in staging

This appears in other PRs too. I am not sure why, but the GPU hosts get stuck sometimes and some tox processes remain. I cleaned them up. Try to rerun the jobs.

@codecov-commenter
Copy link

codecov-commenter commented Feb 2, 2022

Codecov Report

Merging #1621 (8485e97) into staging (7425d21) will increase coverage by 57.97%.
The diff coverage is 20.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##           staging    #1621       +/-   ##
============================================
+ Coverage     0.00%   57.97%   +57.97%     
============================================
  Files           88       88               
  Lines         9091     9096        +5     
============================================
+ Hits             0     5273     +5273     
- Misses           0     3823     +3823     
Flag Coverage Δ
nightly ?
pr-gate 57.97% <20.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
recommenders/datasets/amazon_reviews.py 82.17% <20.00%> (+82.17%) ⬆️
recommenders/datasets/mind.py 0.00% <0.00%> (ø)
recommenders/datasets/movielens.py 69.46% <0.00%> (+69.46%) ⬆️
recommenders/models/sasrec/model.py 26.28% <0.00%> (+26.28%) ⬆️
recommenders/datasets/download_utils.py 90.00% <0.00%> (+90.00%) ⬆️
recommenders/models/newsrec/models/npa.py 95.58% <0.00%> (+95.58%) ⬆️
recommenders/models/newsrec/models/naml.py 92.43% <0.00%> (+92.43%) ⬆️
recommenders/models/newsrec/models/nrms.py 91.37% <0.00%> (+91.37%) ⬆️
recommenders/models/newsrec/models/lstur.py 87.14% <0.00%> (+87.14%) ⬆️
recommenders/evaluation/python_evaluation.py 93.68% <0.00%> (+93.68%) ⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7425d21...8485e97. Read the comment docs.

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's see if after this change the nightly builds take less time

@anargyri
Copy link
Collaborator

anargyri commented Feb 3, 2022

The Spark pipeline keeps failing at the tuning notebook. However it passed when I ran it on a VM. It may be because the GitHub hosts are not well resourced.

@miguelgfierro
Copy link
Collaborator

The Spark pipeline keeps failing at the tuning notebook. However it passed when I ran it on a VM. It may be because the GitHub hosts are not well resourced.

yeah, I think this is expected because the github hosted memory is low, however, there is an OOM in the GPU ones. This is annoying, because the ADO pipeline for the notebook and unit test passes. What are your thoughts here? do you want to debug more the GPU issue or do you think we can merge based on the ADO tests?

@anargyri
Copy link
Collaborator

anargyri commented Feb 4, 2022

The Spark pipeline keeps failing at the tuning notebook. However it passed when I ran it on a VM. It may be because the GitHub hosts are not well resourced.

yeah, I think this is expected because the github hosted memory is low, however, there is an OOM in the GPU ones. This is annoying, because the ADO pipeline for the notebook and unit test passes. What are your thoughts here? do you want to debug more the GPU issue or do you think we can merge based on the ADO tests?

GPU failing is due to the hosts we have whose memory is occupied by some processes running tox from previous tests. Once you kill these processes, the pipeline can run again. However this happens frequently and needs some understanding what is going wrong.

About Spark, I am seeing the tox command failing on a VM that has more than enough memory. Meanwhile, manually installing a conda env and running the tests with pytest works fine (which is consistent with ADO tests passing). I'd like to debug this a little further and post any updates here.

@anargyri
Copy link
Collaborator

anargyri commented Feb 4, 2022

The Spark pipeline keeps failing at the tuning notebook. However it passed when I ran it on a VM. It may be because the GitHub hosts are not well resourced.

yeah, I think this is expected because the github hosted memory is low, however, there is an OOM in the GPU ones. This is annoying, because the ADO pipeline for the notebook and unit test passes. What are your thoughts here? do you want to debug more the GPU issue or do you think we can merge based on the ADO tests?

GPU failing is due to the hosts we have whose memory is occupied by some processes running tox from previous tests. Once you kill these processes, the pipeline can run again. However this happens frequently and needs some understanding what is going wrong.

About Spark, I am seeing the tox command failing on a VM that has more than enough memory. Meanwhile, manually installing a conda env and running the tests with pytest works fine (which is consistent with ADO tests passing). I'd like to debug this a little further and post any updates here.

Also pytest needs 5 reruns before passing.

@anargyri
Copy link
Collaborator

anargyri commented Feb 4, 2022

Ok, there was a bad configuration in the Spark cross validation test, which I fixed. The tests now pass, although there are still a couple of reruns.

@anargyri anargyri merged commit 328322e into staging Feb 4, 2022
@miguelgfierro miguelgfierro deleted the abir/sasrec branch February 7, 2022 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants