Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[testing utils] get_auto_remove_tmp_dir more intuitive behavior #8401

Merged
merged 11 commits into from
Nov 10, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 16 additions & 16 deletions docs/source/testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -720,32 +720,32 @@ Here is an example of its usage:

This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.

In this and all the following scenarios the temporary directory will be auto-removed at the end of test, unless
``after=False`` is passed to the helper function.

* Create a temporary directory of my choice and delete it at the end - useful for debugging when you want to monitor a
specific directory:
* Create a unique temporary dir:

.. code-block:: python

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
tmp_dir = self.get_auto_remove_tmp_dir()

``tmp_dir`` will contain the path to the created temp dir. It will be automatically removed at the end of the test.

* Create a temporary directory of my choice and do not delete it at the end---useful for when you want to look at the
temp results:
* Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test.

.. code-block:: python

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
tmp_dir = self.get_auto_remove_tmp_dir("./xxx")

* Create a temporary directory of my choice and ensure to delete it right away---useful for when you disabled deletion
in the previous test run and want to make sure the that temporary directory is empty before the new test is run:
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't
leave any data in there.

.. code-block:: python
* You can override the default behavior by directly overriding the ``before`` and ``after`` args, leading to one of the
following behaviors:

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
- ``before=True``: the temporary dir will always be cleared at the beginning of the test.
- ``before=False``: if the temporary dir already existed, any existing files will remain there.
- ``after=True``: the temporary dir will always be deleted at the end of the test.
- ``after=False``: the temporary dir will always be left intact at the end of the test.

.. note::
In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if
Expand Down Expand Up @@ -799,7 +799,7 @@ or the ``xfail`` way:
@pytest.mark.xfail
def test_feature_x():

Here is how to skip a test based on some internal check inside the test:
- Here is how to skip a test based on some internal check inside the test:

.. code-block:: python

Expand All @@ -822,7 +822,7 @@ or the ``xfail`` way:
def test_feature_x():
pytest.xfail("expected to fail until bug XYZ is fixed")

Here is how to skip all tests in a module if some import is missing:
- Here is how to skip all tests in a module if some import is missing:

.. code-block:: python

Expand Down
84 changes: 58 additions & 26 deletions src/transformers/testing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -518,41 +518,41 @@ class solves this problem by sorting out all the basic paths and provides easy a

Feature 2: Flexible auto-removable temp dirs which are guaranteed to get removed at the end of test.

In all the following scenarios the temp dir will be auto-removed at the end of test, unless `after=False`.

# 1. create a unique temp dir, `tmp_dir` will contain the path to the created temp dir
1. Create a unique temporary dir:

::

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir()

# 2. create a temp dir of my choice and delete it at the end - useful for debug when you want to # monitor a
specific directory
``tmp_dir`` will contain the path to the created temp dir. It will be automatically removed at the end of the test.


2. Create a temporary dir of my choice, ensure it's empty before the test starts and don't
empty it after the test.

::

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
tmp_dir = self.get_auto_remove_tmp_dir("./xxx")

# 3. create a temp dir of my choice and do not delete it at the end - useful for when you want # to look at the
temp results
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests
didn't leave any data in there.

::
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
3. You can override the first two options by directly overriding the ``before`` and ``after`` args, leading to the
following behavior:

# 4. create a temp dir of my choice and ensure to delete it right away - useful for when you # disabled deletion in
the previous test run and want to make sure the that tmp dir is empty # before the new test is run
``before=True``: the temporary dir will always be cleared at the beginning of the test.

::
``before=False``: if the temporary dir already existed, any existing files will remain there.

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
``after=True``: the temporary dir will always be deleted at the end of the test.

``after=False``: the temporary dir will always be left intact at the end of the test.

Note 1: In order to run the equivalent of `rm -r` safely, only subdirs of the project repository checkout are
allowed if an explicit `tmp_dir` is used, so that by mistake no `/tmp` or similar important part of the filesystem
will get nuked. i.e. please always pass paths that start with `./`
Note 1: In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are
allowed if an explicit ``tmp_dir`` is used, so that by mistake no ``/tmp`` or similar important part of the
filesystem will get nuked. i.e. please always pass paths that start with ``./``

Note 2: Each test can register multiple temp dirs and they all will get auto-removed, unless requested otherwise.

Expand All @@ -567,6 +567,7 @@ def test_whatever(self):
"""

def setUp(self):
# get_auto_remove_tmp_dir feature:
self.teardown_tmp_dirs = []

# figure out the resolved paths for repo_root, tests, examples, etc.
Expand Down Expand Up @@ -654,21 +655,42 @@ def get_env(self):
env["PYTHONPATH"] = ":".join(paths)
return env

def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False):
def get_auto_remove_tmp_dir(self, tmp_dir=None, before=None, after=None):
"""
Args:
tmp_dir (:obj:`string`, `optional`):
use this path, if None a unique path will be assigned
before (:obj:`bool`, `optional`, defaults to :obj:`False`):
if `True` and tmp dir already exists make sure to empty it right away
after (:obj:`bool`, `optional`, defaults to :obj:`True`):
delete the tmp dir at the end of the test
if :obj:`None`:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger, any purpose for why does utils/style_doc.py change the docstring is this strange way - I meant no new lines there - why do we need them?

Copy link
Contributor Author

@stas00 stas00 Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should it perhaps say which file(s) should be restyled when it fails?

Traceback (most recent call last):
  File "utils/style_doc.py", line 465, in <module>
    main(*args.files, max_len=args.max_len, check_only=args.check_only)
  File "utils/style_doc.py", line 453, in main
    raise ValueError(f"{len(changed)} files should be restyled!")
ValueError: 1 files should be restyled!

Perhaps it doesn't matter, since it's automatic anyway... just a thought.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists should always being with a new line before otherwise sphinx (sometimes) throws warnings. That's why the script wants to add them.

As for the warnings, I copied what black does (and it does not say which file should be restyled) :-)


- a unique tmp path will be created
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to avoid making a list:

If unset, a unique tmp path will be created, :obj:`before` and :obj:`after` will default to :obj:`True`. Otherwise a unique tmp path will be chosen and created, :obj:`before` will default to :obj:`True` and :obj:`after` will default to :obj:`False`.

I copied the beginning of the "else" part (a unique tmp path will be chosen and created) but I don't understand it: how is the argument used in this case? Also abbreviations like tmp path should be avoided in the documentayion, we have space to use proper English :-)

Copy link
Contributor Author

@stas00 stas00 Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works, but would you agree that the proposed rewrite is much more difficult to grasp? Making docs more difficult to understand because of formatters is strange to me. I'd rather keep the weird new line. I understand why it is needed.

In my experience if you live in a unix world tmp is sort of a term and it's heavily used in the tech documentation, and therefore it's much quicker to grasp than "temporary" because our brains have been trained on it - but point taken - I have expanded all tmp/temps to temporary (I was just trying to share that to some of us tmp is English).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the improvements you have been making, @sgugger!

- sets ``before=True`` if ``before`` is :obj:`None`
- sets ``after=True`` if ``after`` is :obj:`None`
else:

- a unique tmp path will be chosen and created
- sets ``before=True`` if ``before`` is :obj:`None`
- sets ``after=False`` if ``after`` is :obj:`None`
before (:obj:`bool`, `optional`):
if :obj:`True` and the tmp dir already exists, make sure to empty it right away if :obj:`False` and the
tmp dir already exists, any existing files will remain there.
stas00 marked this conversation as resolved.
Show resolved Hide resolved
after (:obj:`bool`, `optional`):
if :obj:`True`, delete the tmp dir at the end of the test if :obj:`False`, leave the tmp dir and its
contents intact at the end of the test
stas00 marked this conversation as resolved.
Show resolved Hide resolved

Returns:
tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-created tmp
dir
"""
if tmp_dir is not None:

# defining the most likely desired behavior for when a custom path is provided.
# this most likely indicates the debug mode where we want an easily locatable dir that:
# 1. gets cleared out before the test (if it already exists)
# 2. is left intact after the test
if before is None:
before = True
if after is None:
after = False

# using provided path
path = Path(tmp_dir).resolve()

Expand All @@ -685,6 +707,15 @@ def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False):
path.mkdir(parents=True, exist_ok=True)

else:
# defining the most likely desired behavior for when a unique tmp path is auto generated
# (not a debug mode), here we require a unique tmp dir that:
# 1. is empty before the test (it will be empty in this situation anyway)
# 2. gets fully removed after the test
if before is None:
before = True
if after is None:
after = True

# using unique tmp dir (always empty, regardless of `before`)
tmp_dir = tempfile.mkdtemp()

Expand All @@ -695,7 +726,8 @@ def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False):
return tmp_dir

def tearDown(self):
# remove registered temp dirs

# get_auto_remove_tmp_dir feature: remove registered temp dirs
for path in self.teardown_tmp_dirs:
shutil.rmtree(path, ignore_errors=True)
self.teardown_tmp_dirs = []
Expand Down