Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow saving weights of a very deep model into a HDF5 file. #7508

Closed
wants to merge 3 commits into from

Conversation

tunystom
Copy link

@tunystom tunystom commented Aug 3, 2017

I encountered a problem while saving weights of very deep Keras models into HDF5 file. The culprit is the HDF5 file format's inability to save object headers larger than 64 kB (according to https://support.hdfgroup.org/HDF5/faq/limits.html).

This pull request should solve the following issues (without the need to implement any custom saving/loading methods as proposed by some comments)

I also answered to a question to a related issue on StackOverflow:

The fix is really simple. If the data array (layer_names and weight_names specifically) being saved to HDF5 group is too large, it is simply chunked into several pieces (until they fit the memory limit) and saved individually under the original attribute name with an associated chunk number appended to it. The loading from a HDF5 file is implemented correspondingly.

…_group() methods that can deal with the problem of saving models with a lot of layers (or with nested deep models) into HDF5 files
Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a unit test that covers the case where header chunking occurs.

What's the number of layers past which you need this?

@tunystom
Copy link
Author

@fchollet FYI, I added the unit tests you asked for. I am not sure whether it is required to check the internals of the created HDF5 file or not, but the current tests do so.

The number of layers of the model I could not save was over 1K. But the issue was not with the layer_names attribute inside HDF5 file (which also suffers from the same memory limit constraint and is fixed similarly), but with a different attribute array - weights_names. This happened as a result of the way I designed the whole model. I have actually 2 models: one is used as a nested model (feature extractor), this is the huge one, and the other servers as a backend built on top of it. If you take a look at the test method test_saving_model_with_long_weights_names you will see what I mean.

So, when you try to save a model containing a nested model, what happens is that the names of weights of the nested model are all saved within a single weights_names attribute of the HDF5 file. Since the list of names is converted to numpy array inside h5py, which effectively means every string inside the array will occupy the same amount as the longest string, it requires a lot of memory even for not very deep models after all.

@stale
Copy link

stale bot commented Nov 16, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the stale label Nov 16, 2017
@stale stale bot removed the stale label Nov 20, 2017
@fchollet
Copy link
Member

Sorry for not reviewing this earlier. It got forgotten in the PR queue. But it seems like it would still be a useful fix.

This PR has merge conflicts and will need a review (at a glance there will be at least small style issues to fix). Are you still interested in working on it?

@qieaaa
Copy link

qieaaa commented Feb 15, 2018

Thanks for @tunystom !
I met this h5py error when I run Nasnet large.
I tried to replace your topology.py script in keras folder and there is still an error about training.py under engine folder.Which is
AttributeError: 'Model' object has no attribute '_internal_output_shapes'
I add one line in line 608 in training.py
self._internal_output_shapes= self.internal_output_shapes
and it works well.

Copy link

@qieaaa qieaaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and it works well 👍

@floydium
Copy link

I was able to rebase this PR against the current master. This required a minor conflict resolution in a unit test file within the test folder (which frankly I resolved quite randomly - only a few lines of code). Then I copied the newly rebased topology.py file into my keras library folder, and it did work without any issues. I was able to save NASNetMobile weights.

ahundt added a commit to ahundt/keras that referenced this pull request Feb 16, 2018
Allow saving weights of a very deep model into a HDF5 file.

# Conflicts:
#	tests/test_model_saving.py
@ahundt
Copy link
Contributor

ahundt commented Feb 16, 2018

merged and created an updated PR in #9398

@ahundt
Copy link
Contributor

ahundt commented Feb 16, 2018

@floydium @qieaaa could you review the new PR #9398? Is an additional change needed for self._internal_output_shapes= self.internal_output_shapes?

@qieaaa
Copy link

qieaaa commented Feb 16, 2018

@ahundt Thank you so much ! I just tried it, it's cool
I replace your new topology.py and delete my additional self._internal_output_shapes= self.internal_output_shapes. I run it and all work well,thanks! I can review the new PR

@fchollet
Copy link
Member

Merged updated PR.

@fchollet fchollet closed this Feb 17, 2018
@Anurag27031994
Copy link

#Assuming that this is your model architecture. However, you may use whatever architecture, you want to (big or small; any).
def mymodel():
inputShape= (28, 28, 3);
model= Sequential()
model.add(Conv2D(20, 5, padding="same", input_shape=inputShape))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(500))
model.add(Activation('relu'))
model.add(Dense(2, activation= "softmax"))
return model
model.fit(....) #paramaters to start training your model

################################################################################
################################################################################
#once your model has been trained, you want to save your model in your PC
#use get.weights() command to get your model weights
weigh= model.get_weights()

#now, use pickle to save your model weights, instead of .h5
#for heavy model architectures, .h5 file is unsupported.
pklfile= "D:/modelweights.pkl"
try:
fpkl= open(pklfile, 'wb') #Python 3
pickle.dump(weigh, fpkl, protocol= pickle.HIGHEST_PROTOCOL)
fpkl.close()
except:
fpkl= open(pklfile, 'w') #Python 2
pickle.dump(weigh, fpkl, protocol= pickle.HIGHEST_PROTOCOL)
fpkl.close()

################################################################################
################################################################################
#in future, you mave want to load your model back
#use pickle to load model weights

pklfile= "D:/modelweights.pkl"
try:
f= open(pklfile) #Python 2

weigh= pickle.load(f);                
f.close();

except:

f= open(pklfile, 'rb')     #Python 3                 
weigh= pickle.load(f);                
f.close();

restoredmodel= mymodel()
#use set_weights to load the modelweights into the model architecture
restoredmodel.set_weights(weigh)

################################################################################
################################################################################
#now, you can do your testing and evaluation- predictions
y_pred= restoredmodel.predict(X)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants