Update xDeepFM to work with random seed #748

loomlike · 2019-04-18T19:48:14Z

Description

Changes include:

Refactor xDeepFM to fully use the seed
Update xdeepfm smoke test
Add xdeepfm unit and integration tests
Change xDeepFM quickstart notebook name to xdeepfm_criteo
Update the notebook contents

Related Issues

Checklist:

My code follows the code style of this project, as detailed in our contribution guidelines.
I have added tests.
I have updated the documentation accordingly.

Refactor xDeepFM to fully use the seed Update xdeepfm smoke test Add xdeepfm unit and integration tests Change xDeepFM quickstart notebook name to xdeepfm_criteo Update the notebook contents

review-notebook-app · 2019-04-18T19:48:16Z

Check out this pull request on ReviewNB: https://app.reviewnb.com/Microsoft/Recommenders/pull/748

Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows.

miguelgfierro · 2019-04-18T21:00:59Z

reco_utils/recommender/deeprec/models/base_model.py

@@ -14,7 +12,7 @@


 class BaseModel(object):
-    def __init__(self, hparams, iterator_creator, graph=None, seed=42):
+    def __init__(self, hparams, iterator_creator, graph=None, seed=None):


why None here? in the rest of the notebooks we have 42

The default seed should be the same as the functions' that we use in the module, which is None. See, e.g., tensorflow.

This follows the rational that if you don't pass the seed explicitly, the behavior should be 'random' (produce different results every time you run it).
e.g. Python's random behavior:

random.seed(a=None, version=2): If a is omitted or None, the current system time is used.

if this is the defualt behavior we want to follow, then we should have the same idea in all the other algos, which should be changed. Do we want to follow this idea instead of fixing the seed? @anargyri @yueguoguo @gramhagen

I think it is a good practice to use seed=None for algo modules, and set seed explicitly from example notebooks.

This sounds reasonable, i guess right now you would have to set the seed to None to actually get random values, which may be unexpected. I think we should adopt this as it matches approaches from other libraries and create a separate ticket to change the notebooks / code / tests as needed.

in numpy the default is None: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html#numpy.random.seed, so I think it makes sense to change all to None

@loomlike do you want to homogenize None in all the utilities and then set 42 explicitly in the notebooks? Not sure if on this PR or in a different want

@miguelgfierro Created a separate issue to handle them (#753)

gramhagen · 2019-04-19T15:34:34Z

tests/smoke/test_notebooks_gpu.py

 @pytest.mark.gpu
-def test_ncf_deep_dive(notebooks):
+def test_ncf_deep_dive_smoke(notebooks):


why change the all test names? they're already in a separate folder and marked as smoke ?

also, here we want to keep @pytest.make.notebooks as well right? hmm, odd we don't have notebooks marked on the rest of the tests ?

also, can we reduce the tolerance levels now?

Changed the name to be the same as the rest of the tests in the module.
Don't know who started put "_smoke" there first. haha
I kinda like it, because when it fails, it is easy to read from the pytest log what's failing, unit vs smoke vs integration.

Regarding the tolerance levels, I believe NNs models have been using their own numbers (hard-coded), correct me if I'm wrong, while the others use TOL and ABS_TOL constants. So, we can leave the numbers as it is. But, actually, I want to use TOL only, not ABS_TOL, since different metrics' range could be different for each other and thus using absolute tolerance doesn't make sense (TOL is relative).

If we agree to not use ABS_TOL, we can create a separate issue to take care of that.

ok, makes sense as long as they're consistently named

yeah, whatever tolerance is appropriate is fine, but it would be good to be small enough to detect an potential error, vs random noise now that that is fixed. fine with creating a separate issue for that

Also, I was wondering about notebooks mark too. Integration and smoke tests don't have that mark. Maybe because we don't use notebooks mark when we run smoke and integration tests?

yes, that's true, for nightly smoke and integration tests we run 3 versions: spark, gpu, base, there's no differentiation for notebooks so I think it's fine to remove it.

we have to be carefull with ABS_TOL, this was added because rel_tol was not enough when testing very small metric results (in the order of 0.001 or so) I wouldn't remove it unless we are sure that these problems are gone

Ah, yeah. Make sense.

Since we don't use that mark.

* Update xDeepFM to work with random seed - Refactor xDeepFM to fully use the seed - Update xdeepfm smoke test - Add xdeepfm unit and integration tests - Remove pytest mark 'deeprec' * Change xDeepFM quickstart notebook name to xdeepfm_criteo - Update README xDeepFM notebook path - Update README xDeepFM path and description * Update the notebook contents

Update xDeepFM to work with random seed

3d55023

Refactor xDeepFM to fully use the seed Update xdeepfm smoke test Add xdeepfm unit and integration tests Change xDeepFM quickstart notebook name to xdeepfm_criteo Update the notebook contents

loomlike requested review from miguelgfierro and anargyri April 18, 2019 19:48

loomlike requested a review from yueguoguo as a code owner April 18, 2019 19:48

loomlike added 2 commits April 18, 2019 15:49

Update README xDeepFM notebook path

17e9f3a

Update README xDeepFM path and description

4364f7b

miguelgfierro reviewed Apr 18, 2019

View reviewed changes

loomlike requested a review from gramhagen April 19, 2019 14:51

gramhagen reviewed Apr 19, 2019

View reviewed changes

loomlike requested review from gramhagen and miguelgfierro April 19, 2019 16:54

Remove pytest mark 'deeprec'

ba9ccc6

Since we don't use that mark.

gramhagen approved these changes Apr 19, 2019

View reviewed changes

loomlike merged commit 8feba89 into staging Apr 23, 2019

loomlike deleted the jumin/xdeepfm-random-seed branch April 23, 2019 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update xDeepFM to work with random seed #748

Update xDeepFM to work with random seed #748

loomlike commented Apr 18, 2019

review-notebook-app bot commented Apr 18, 2019

miguelgfierro Apr 18, 2019

loomlike Apr 19, 2019

miguelgfierro Apr 19, 2019

loomlike Apr 20, 2019

gramhagen Apr 22, 2019

miguelgfierro Apr 23, 2019

loomlike Apr 23, 2019 •

edited

Loading

gramhagen Apr 19, 2019

gramhagen Apr 19, 2019

loomlike Apr 19, 2019 •

edited

Loading

loomlike Apr 19, 2019

gramhagen Apr 19, 2019

loomlike Apr 19, 2019

gramhagen Apr 19, 2019

miguelgfierro Apr 23, 2019

loomlike Apr 23, 2019

Update xDeepFM to work with random seed #748

Update xDeepFM to work with random seed #748

Conversation

loomlike commented Apr 18, 2019

Description

Related Issues

Checklist:

review-notebook-app bot commented Apr 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loomlike Apr 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loomlike Apr 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loomlike Apr 23, 2019 •

edited

Loading

loomlike Apr 19, 2019 •

edited

Loading