-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow saving models directly to binary stream #9789
Conversation
This change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary byte stream.
09ffa5a
to
68096a9
Compare
There was also this similar older PR, now stale (although it seems much larger and seems to do more). Please take a look: #7546 |
keras/models.py
Outdated
proceed = ask_to_proceed_with_overwrite(filepath) | ||
if not proceed: | ||
return | ||
opened_new_file = not isinstance(filepath, h5py.File) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest only setting this flag to True
after actually creating the file (since it is used to call f.close()
at the end).
You can start with if not isinstance(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll do that.
@@ -179,13 +187,18 @@ def get_json_type(obj): | |||
else: | |||
param_dset[:] = val | |||
f.flush() | |||
finally: | |||
if opened_new_file: | |||
f.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original exit method, close()
is only done if f.id
is set. https://github.com/h5py/h5py/blob/4b5a901fb297f6ae5a51ff992aa8a626a7f3c3a2/h5py/_hl/files.py#L359
Is this something important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my understanding given the h5py
code and documentation, f.id
is simply a check for whether the file is still open or not. This is based on the close()
method also checking the validity of f.id
: https://github.com/h5py/h5py/blob/4b5a901fb297f6ae5a51ff992aa8a626a7f3c3a2/h5py/_hl/files.py#L325
As I'm going to implement your suggestion and only set opened_new_file = True
after the file is already opened, I believe that checking f.id
as well is unnecessary when we've already checked opened_new_file
right before.
@fchollet thank you for the suggestions and the pointer to the prior PR. I was trying to keep this PR's scope as small and as limited as possible. I think the refactoring and new functionality that #7546 proposes would also be valuable to have, and I am in favor of getting that PR merged as well. However, I think there is value in merging this quickly to unblock the use cases of the people who opened issues asking for this functionality, and then doing the refactoring and new functionality of #7546 separately and in addition. I'll apply the changes you suggested in the next few minutes! |
a22e107
to
226fddb
Compare
I think the build finished but Travis never posted the result -- let me trigger it again. |
226fddb
to
c55a12d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Thank you! #9343 is likely resolved now, you might be able to close it. Looking forward to the next release! |
…ack-embeddings-from-layer-outputs * upstream/master: (68 commits) fit/evaluate_generator supporting native tensors (keras-team#9816) keras-team#9642 Add kwarg and documentation for dilation_rate to SeparableConvs (keras-team#9844) Document that "same" is inconsistent across backends with strides!=1 (keras-team#9629) Improve tests by designating dtype of sample data (keras-team#9834) Add documentation for 'subset' and interpolation' arguments (ImageDataGenerator) (keras-team#9817) Revert default theme to readthedocs Various docs fixes. Fix conflict Add support for class methods documentation (keras-team#9751) Add missing verbose opt for evaluate_generator (keras-team#9811) Added `data_format` to flatten layer. (keras-team#9696) Allow saving models directly to binary stream (keras-team#9789) Fix ctc_batch_cost() error when batch_size = 1 (keras-team#9775) Fix keras-team#9802 (keras-team#9803) Fix error in ImageDataGenerator documentation (keras-team#9798) fix typo (keras-team#9792) keras-team#9733: Extend RemoteMonitor to send data as application/json (keras-team#9734) Fixed inconsistencies regarding ReduceLROnPlateau (keras-team#9723) Fix doc issue. General stateful metrics fixes (keras-team#9446) ...
* Allow model saving/loading code to accept h5py.File objects. This change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary byte stream. * Record file opening only after successfully opening the file.
This is a minor change in the contract of
save_model()
andload_model()
, allowing thefilepath
argument to accepth5py.File
objects as well. The change allows saving models to memory-mapped files that do not have a physical presence on disk, and extraction of the serialized model data as a raw binary stream. I believe that resolves the following issues:#9343
#6794
I added test cases for both the "vanilla" utilization of a
h5py.File
object, and the memory-mapping + binary stream use case.This is my first contribution to this project, so I apologize in advance if I missed something in the contributing guidelines -- happy to change things as directed.
This PR is joint work with @mikeyshulman.