-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wide-deep hyperdrive notebook Azureml API update #847
Conversation
Check out this pull request on ReviewNB: https://app.reviewnb.com/microsoft/recommenders/pull/847 You'll be able to see visual diffs and write comments on notebook cells. Powered by ReviewNB. |
@loomlike I re-run the unit-test build and it passed. Can you trigger it again? |
@@ -7,7 +7,7 @@ | |||
"<i>Copyright (c) Microsoft Corporation. All rights reserved.<br>\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any thoughts on why these values dropped so much?
also, i think using getattr is a bit obscure, it would be clearer to just call the metrics code directly
Reply via ReviewNB
@@ -7,7 +7,7 @@ | |||
"<i>Copyright (c) Microsoft Corporation. All rights reserved.<br>\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what changed here to impact the metrics? are we not setting a random seed when training the model?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The w&d notebook does use the default seed unless we explicitly set None
.
Will test again and see if it reports the same number again (should be).
Use Steps instead of Epochs as most TF examples do. Add model export fn and input-fn-for-saved-model fn. Update notebooks and tests accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just one question on steps vs epochs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question on path creation in tests
tests/unit/test_wide_deep_utils.py
Outdated
assert wide_columns[2].hash_bucket_size == 10 | ||
# Check model type | ||
model = build_model( | ||
str(tmp_path / ('wide_' + MODEL_DIR)), wide_columns=wide_columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this does?
str(tmp_path / ('wide_' + MODEL_DIR))
why is there a division operator here? should it be os.path.join(str(tmp_path), 'wide_' + MODEL_DIR), would this be causing the problem with windows path cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gramhagen when I use our fixture (tmp) for temporary directory, it causes error (only) on Windows DSVM. So I end up using pytest's fixture tmp_path witch returns Path
object. And the codes use Path object's syntax /
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, the error is -- somehow Windows' DSVM holds TF model checkpoints folder and thus TemporaryDirectory cannot cleanup the folder by the time the test has done.
pytest's fixture tmp_path keeps the folder for last three test sessions instead of removing it right away. That's how it avoids the cleanup error. Not a clean solution but at least it is working for now for Windows testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see, I wasn't familiar with that syntax. Do you think there is something that needs to be done to close the tf session or something in order to release the tmp path for cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the best thing to do is keep it as is and file an issue for using the normal yield fixture with TF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried del model
and no luck. Kind of hard to debug because that is happening only from Win-DSVM. @roalexan Any idea about this? FYI no issues when use local Windows machines (both laptop and workstation). Does Win-DSVM handle processes and folder read/write permission differently from local Windows'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just be Windows Server 2016 - don't see how it would be different than local. Link to product page. Maybe we could use one of these email aliases to ask a question about this: dsvm-product-team@ or Dsvm-interest@.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat related issue found from Tensorflow github issues: tensorflow/tensorflow#20606
I will try to force close tf's FileWriter and see if that solves this issue on Windows DSVM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok it does solve. I'm working on adding those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, feel free to merge if everything passes
* AzureML api update * Wide-deep and tf-utils update - Use Steps instead of Epochs as most TF examples do. - Add model export fn and input-fn-for-saved-model fn. - Update notebooks and tests accordingly * Simplify wd test * Simplify deeprec and ncf unit tests * Safely close tf event file in tests
Description
AzureML API update fix
Related Issues
#842
Checklist: