Allow saving models directly to binary stream #9789

obi1kenobi · 2018-03-29T02:23:25Z

This is a minor change in the contract of save_model() and load_model(), allowing the filepath argument to accept h5py.File objects as well. The change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary stream. I believe that resolves the following issues:
#9343
#6794

I added test cases for both the "vanilla" utilization of a h5py.File object, and the memory-mapping + binary stream use case.

This is my first contribution to this project, so I apologize in advance if I missed something in the contributing guidelines -- happy to change things as directed.

This PR is joint work with @mikeyshulman.

This change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary byte stream.

fchollet · 2018-03-29T17:46:40Z

There was also this similar older PR, now stale (although it seems much larger and seems to do more). Please take a look: #7546

fchollet · 2018-03-29T17:40:42Z

keras/models.py

-        proceed = ask_to_proceed_with_overwrite(filepath)
-        if not proceed:
-            return
+    opened_new_file = not isinstance(filepath, h5py.File)


I suggest only setting this flag to True after actually creating the file (since it is used to call f.close() at the end).

You can start with if not isinstance(...)

Good idea, I'll do that.

fchollet · 2018-03-29T17:44:59Z

keras/models.py

@@ -179,13 +187,18 @@ def get_json_type(obj):
                        else:
                            param_dset[:] = val
        f.flush()
+    finally:
+        if opened_new_file:
+            f.close()


In the original exit method, close() is only done if f.id is set. https://github.com/h5py/h5py/blob/4b5a901fb297f6ae5a51ff992aa8a626a7f3c3a2/h5py/_hl/files.py#L359

Is this something important?

To the best of my understanding given the h5py code and documentation, f.id is simply a check for whether the file is still open or not. This is based on the close() method also checking the validity of f.id: https://github.com/h5py/h5py/blob/4b5a901fb297f6ae5a51ff992aa8a626a7f3c3a2/h5py/_hl/files.py#L325

As I'm going to implement your suggestion and only set opened_new_file = True after the file is already opened, I believe that checking f.id as well is unnecessary when we've already checked opened_new_file right before.

obi1kenobi · 2018-03-29T19:24:15Z

@fchollet thank you for the suggestions and the pointer to the prior PR.

I was trying to keep this PR's scope as small and as limited as possible. I think the refactoring and new functionality that #7546 proposes would also be valuable to have, and I am in favor of getting that PR merged as well. However, I think there is value in merging this quickly to unblock the use cases of the people who opened issues asking for this functionality, and then doing the refactoring and new functionality of #7546 separately and in addition.

I'll apply the changes you suggested in the next few minutes!

obi1kenobi · 2018-03-29T21:56:47Z

I think the build finished but Travis never posted the result -- let me trigger it again.

fchollet

LGTM, thanks!

obi1kenobi · 2018-04-02T14:21:54Z

Thank you! #9343 is likely resolved now, you might be able to close it.

Looking forward to the next release!

…ack-embeddings-from-layer-outputs * upstream/master: (68 commits) fit/evaluate_generator supporting native tensors (keras-team#9816) keras-team#9642 Add kwarg and documentation for dilation_rate to SeparableConvs (keras-team#9844) Document that "same" is inconsistent across backends with strides!=1 (keras-team#9629) Improve tests by designating dtype of sample data (keras-team#9834) Add documentation for 'subset' and interpolation' arguments (ImageDataGenerator) (keras-team#9817) Revert default theme to readthedocs Various docs fixes. Fix conflict Add support for class methods documentation (keras-team#9751) Add missing verbose opt for evaluate_generator (keras-team#9811) Added `data_format` to flatten layer. (keras-team#9696) Allow saving models directly to binary stream (keras-team#9789) Fix ctc_batch_cost() error when batch_size = 1 (keras-team#9775) Fix keras-team#9802 (keras-team#9803) Fix error in ImageDataGenerator documentation (keras-team#9798) fix typo (keras-team#9792) keras-team#9733: Extend RemoteMonitor to send data as application/json (keras-team#9734) Fixed inconsistencies regarding ReduceLROnPlateau (keras-team#9723) Fix doc issue. General stateful metrics fixes (keras-team#9446) ...

* Allow model saving/loading code to accept h5py.File objects. This change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary byte stream. * Record file opening only after successfully opening the file.

obi1kenobi changed the title ~~Allow model saving/loading code to accept h5py.File objects.~~ Allow saving models directly to binary stream Mar 29, 2018

Allow model saving/loading code to accept h5py.File objects.

68096a9

This change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary byte stream.

obi1kenobi force-pushed the save_to_stream branch from 09ffa5a to 68096a9 Compare March 29, 2018 13:42

This was referenced Mar 29, 2018

FeatureRequest: Save/Load model to/from iostream #9343

Closed

Need a way to serialize Keras models abstractly -- not only to disk #6794

Closed

fchollet reviewed Mar 29, 2018

View reviewed changes

obi1kenobi force-pushed the save_to_stream branch from a22e107 to 226fddb Compare March 29, 2018 20:00

Record file opening only after successfully opening the file.

c55a12d

obi1kenobi force-pushed the save_to_stream branch from 226fddb to c55a12d Compare March 29, 2018 21:56

fchollet approved these changes Mar 31, 2018

View reviewed changes

fchollet merged commit 1ee31ee into keras-team:master Mar 31, 2018

xFlux mentioned this pull request Nov 8, 2018

allow saving weights directly to a binary stream #11608

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow saving models directly to binary stream #9789

Allow saving models directly to binary stream #9789

obi1kenobi commented Mar 29, 2018 •

edited

Loading

fchollet commented Mar 29, 2018

fchollet Mar 29, 2018

obi1kenobi Mar 29, 2018

fchollet Mar 29, 2018

obi1kenobi Mar 29, 2018

obi1kenobi commented Mar 29, 2018 •

edited

Loading

obi1kenobi commented Mar 29, 2018 •

edited

Loading

fchollet left a comment

obi1kenobi commented Apr 2, 2018 •

edited

Loading

Allow saving models directly to binary stream #9789

Allow saving models directly to binary stream #9789

Conversation

obi1kenobi commented Mar 29, 2018 • edited Loading

fchollet commented Mar 29, 2018

fchollet Mar 29, 2018

Choose a reason for hiding this comment

obi1kenobi Mar 29, 2018

Choose a reason for hiding this comment

fchollet Mar 29, 2018

Choose a reason for hiding this comment

obi1kenobi Mar 29, 2018

Choose a reason for hiding this comment

obi1kenobi commented Mar 29, 2018 • edited Loading

obi1kenobi commented Mar 29, 2018 • edited Loading

fchollet left a comment

Choose a reason for hiding this comment

obi1kenobi commented Apr 2, 2018 • edited Loading

obi1kenobi commented Mar 29, 2018 •

edited

Loading

obi1kenobi commented Mar 29, 2018 •

edited

Loading

obi1kenobi commented Mar 29, 2018 •

edited

Loading

obi1kenobi commented Apr 2, 2018 •

edited

Loading