Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically saving run files in pt.Experiment() #163

Closed
cmacdonald opened this issue May 19, 2021 · 2 comments
Closed

Automatically saving run files in pt.Experiment() #163

cmacdonald opened this issue May 19, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@cmacdonald
Copy link
Contributor

cmacdonald commented May 19, 2021

Is your feature request related to a problem? Please describe.

I want to save results as I evaluate them

Describe the solution you'd like

Add a files= kwarg to pt.Experiment() (or filenames=)

pt.Experiment(
  systems=[bm25, dph],
  ...
  files=['bm25.res.gz', 'dph.res.gz']
)

Describe alternatives you've considered

  1. Current neatest solution - appending a transformer to save results:
def save(filename) -> TransformerBase:
  def _save(df):
    pt.io.write_results(df, filename)
    return df
  return pt.apply.generic(_save)

pt.Experiment(
  systems=[bm25 >> save('bm25.res.gz') , dph  >> save('dph.res.gz')],
  ...
)

Pros:

  • Can be done now without needing any change

Cons:

  • Klunky to do for each
  1. Save to directory
pt.Experiment(
  systems=[bm25, dph],
  ...
  save_dir='/dir/to/save/',
  names=['bm25', 'dph']
)

would create files named bm25.res.gz and dph.res.gz in /dir/to/save/

Pros:

  • dont have to construct filenames manually
    Cons:
  • seems inelegant

Suggested by @albertoueda

@cmacdonald
Copy link
Contributor Author

I think we should revisit this. For reproducibility purposes, e.g. in papers, its important to save runs as they are created.
My current favourite is:

  1. Save to directory
pt.Experiment(
  systems=[bm25, dph],
  ...
  save_dir='/dir/to/save/',
  names=['bm25', 'dph']
)

A further kwarg could define the saving/caching behaviour, e.g.

  • save='overwrite' - for always replace the files
  • save='reuse' - for always use files that already exist

(what would be the default)?

cmacdonald added a commit that referenced this issue Dec 8, 2021
cmacdonald added a commit that referenced this issue Dec 9, 2021
@cmacdonald
Copy link
Contributor Author

Now addressed in #247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant