Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytest tmp_dir fixture #706

Merged
merged 3 commits into from
Apr 4, 2019
Merged

pytest tmp_dir fixture #706

merged 3 commits into from
Apr 4, 2019

Conversation

loomlike
Copy link
Collaborator

@loomlike loomlike commented Apr 3, 2019

Description

  • Introduce tmp_dir fixture
  • Refactor tests to use the fixture
  • Fix a bug in movielens load_spark_df - could throw an error when item_df is None. Now it handles the case.
  • Make movielens loader require at least two columns in the header, user and item

Related Issues

#701

Checklist:

  • My code follows the code style of this project, as detailed in our contribution guidelines.
  • I have added tests.
  • I have updated the documentation accordingly.

Refactor tests to use the fixture
Fix movielens bug
@@ -23,6 +23,15 @@
pass # so the environment without spark doesn't break


@pytest.fixture
def tmp_dir(tmp_path_factory):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a note of naming, the pytest temporary dir is called tmpdir which is almost the same as ours. Maybe we should rename it to avoid users mistake pytest fixture with our fixture. Maybe temporary_dir?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion. maybe even more specific, something like fn_scope_tmpdir?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or does it sound like some kind of function because of some name convention?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm missing where tmp_path_factory is coming from?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name changed to just tmp, named after linux tmp dir. I picked that for simplicity, but if you think it is not very descriptive, feel free to let me know or give any suggestions. temporary_dir looks (visually) unbalanced to me...haha

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question, do we foresee any case where we want to have a directory per (pytest) module or even per (pytest) session?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make fixtures for them when we need, like sess_tmp and module_tmp.

@@ -50,53 +49,50 @@ def test_load_pandas_df(
title_example,
genres_example,
year_example,
tmp_dir,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to import this since it is a fixture that it is in conftest, see this: https://github.com/Microsoft/Recommenders/blob/master/tests/conftest.py#L4

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhhh good! will update

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nvm. it means you don't need to do import your_fixture_module. You still need to pass the fixture object as an argument. I think the comments contain misleading contents.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, yeah, what you said is more detailed. It would be nice to have it in the code

@@ -23,6 +23,15 @@
pass # so the environment without spark doesn't break


@pytest.fixture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the context manager option, maybe it can be implemented as we are discussing here: #701 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try out that. looks promising.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked and turns out normal fixture does the same thing since newer version of pytest.

Since pytest-3.0, fixtures using the normal fixture decorator can use a yield statement to provide fixture values and execute teardown code, exactly like yield_fixture in previous versions. Marking functions as yield_fixture is still supported, but deprecated and should not be used in new code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good, so if we use the fixture as you program it using: with temp_dir: is the folder erased after the test function is finished?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Even try/finally does remove the dir too, but using the context manager made the code simpler and give us peace of mind :-)

Change to use context manager
Change fixture name from tmp_dir to tmp
WARNING_HAVE_SCHEMA_AND_HEADER = """Both schema and header are provided.
The header argument will be ignored."""
ERROR_MOVIE_LENS_SIZE = "Invalid data size. Should be one of {100k, 1m, 10m, or 20m}"
ERROR_NO_HEADER = "No header (schema) information"
ERROR_NO_HEADER = "No header (schema) information. At least user and movie column names should be provided"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like these constant string of error/warning message. Do we later want to consolidate this into somewhere in common/constants? Or, at least, adopt this practice in other utility codes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need, sure why not. We can even generalize further, e.g. use obj.__name__ or to accept names as args.

if movie_col is not None:
item_header.append(movie_col)
usecols.append(0)
if title_col is None and genres_col is None and year_col is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the check now is different from before? The previous checks the movie_col and return None if it does not exist, whilst now it checks other three columns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, made movie_col as a default (must-have) column, checking earlier as: if header is None or len(header) < 2: (2 is the first two columns, user and movie), because there is no point to load only user ids.

@@ -23,6 +23,12 @@
pass # so the environment without spark doesn't break


@pytest.fixture
def tmp(tmp_path_factory):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice and clean ❤️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loomlike loomlike merged commit e421c00 into staging Apr 4, 2019
@loomlike loomlike deleted the jumin/pytest_tmp_dir branch April 4, 2019 19:38
yueguoguo pushed a commit that referenced this pull request Sep 9, 2019
* pytest tmp_dir fixture
Refactor tests to use the fixture
Fix movielens bug

* Update conftest description
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants