Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow saving models directly to binary stream #9789

Merged
merged 2 commits into from
Mar 31, 2018

Conversation

obi1kenobi
Copy link
Contributor

@obi1kenobi obi1kenobi commented Mar 29, 2018

This is a minor change in the contract of save_model() and load_model(), allowing the filepath argument to accept h5py.File objects as well. The change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary stream. I believe that resolves the following issues:
#9343
#6794

I added test cases for both the "vanilla" utilization of a h5py.File object, and the memory-mapping + binary stream use case.

This is my first contribution to this project, so I apologize in advance if I missed something in the contributing guidelines -- happy to change things as directed.

This PR is joint work with @mikeyshulman.

@obi1kenobi obi1kenobi changed the title Allow model saving/loading code to accept h5py.File objects. Allow saving models directly to binary stream Mar 29, 2018
This change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary byte stream.
@fchollet
Copy link
Member

There was also this similar older PR, now stale (although it seems much larger and seems to do more). Please take a look: #7546

keras/models.py Outdated
proceed = ask_to_proceed_with_overwrite(filepath)
if not proceed:
return
opened_new_file = not isinstance(filepath, h5py.File)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest only setting this flag to True after actually creating the file (since it is used to call f.close() at the end).

You can start with if not isinstance(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll do that.

@@ -179,13 +187,18 @@ def get_json_type(obj):
else:
param_dset[:] = val
f.flush()
finally:
if opened_new_file:
f.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original exit method, close() is only done if f.id is set. https://github.com/h5py/h5py/blob/4b5a901fb297f6ae5a51ff992aa8a626a7f3c3a2/h5py/_hl/files.py#L359

Is this something important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the best of my understanding given the h5py code and documentation, f.id is simply a check for whether the file is still open or not. This is based on the close() method also checking the validity of f.id: https://github.com/h5py/h5py/blob/4b5a901fb297f6ae5a51ff992aa8a626a7f3c3a2/h5py/_hl/files.py#L325

As I'm going to implement your suggestion and only set opened_new_file = True after the file is already opened, I believe that checking f.id as well is unnecessary when we've already checked opened_new_file right before.

@obi1kenobi
Copy link
Contributor Author

obi1kenobi commented Mar 29, 2018

@fchollet thank you for the suggestions and the pointer to the prior PR.

I was trying to keep this PR's scope as small and as limited as possible. I think the refactoring and new functionality that #7546 proposes would also be valuable to have, and I am in favor of getting that PR merged as well. However, I think there is value in merging this quickly to unblock the use cases of the people who opened issues asking for this functionality, and then doing the refactoring and new functionality of #7546 separately and in addition.

I'll apply the changes you suggested in the next few minutes!

@obi1kenobi
Copy link
Contributor Author

obi1kenobi commented Mar 29, 2018

I think the build finished but Travis never posted the result -- let me trigger it again.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@fchollet fchollet merged commit 1ee31ee into keras-team:master Mar 31, 2018
@obi1kenobi
Copy link
Contributor Author

obi1kenobi commented Apr 2, 2018

Thank you! #9343 is likely resolved now, you might be able to close it.

Looking forward to the next release!

dschwertfeger added a commit to dschwertfeger/keras that referenced this pull request Apr 6, 2018
…ack-embeddings-from-layer-outputs

* upstream/master: (68 commits)
  fit/evaluate_generator supporting native tensors (keras-team#9816)
  keras-team#9642 Add kwarg and documentation for dilation_rate to SeparableConvs (keras-team#9844)
  Document that "same" is inconsistent across backends with strides!=1 (keras-team#9629)
  Improve tests by designating dtype of sample data (keras-team#9834)
  Add documentation for 'subset' and interpolation' arguments (ImageDataGenerator) (keras-team#9817)
  Revert default theme to readthedocs
  Various docs fixes.
  Fix conflict
  Add support for class methods documentation (keras-team#9751)
  Add missing verbose opt for evaluate_generator (keras-team#9811)
  Added `data_format` to flatten layer. (keras-team#9696)
  Allow saving models directly to binary stream (keras-team#9789)
  Fix ctc_batch_cost() error when batch_size = 1 (keras-team#9775)
  Fix keras-team#9802 (keras-team#9803)
  Fix error in ImageDataGenerator documentation (keras-team#9798)
  fix typo (keras-team#9792)
  keras-team#9733: Extend RemoteMonitor to send data as application/json (keras-team#9734)
  Fixed inconsistencies regarding ReduceLROnPlateau (keras-team#9723)
  Fix doc issue.
  General stateful metrics fixes (keras-team#9446)
  ...
Vijayabhaskar96 pushed a commit to Vijayabhaskar96/keras that referenced this pull request May 3, 2018
* Allow model saving/loading code to accept h5py.File objects.

This change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary byte stream.

* Record file opening only after successfully opening the file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants