Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving to disk an in-memory file #807

Open
huard opened this issue Jun 15, 2018 · 7 comments
Open

Saving to disk an in-memory file #807

huard opened this issue Jun 15, 2018 · 7 comments

Comments

@huard
Copy link
Contributor

huard commented Jun 15, 2018

My use case is that of a python console requesting a netCDF file from an http server. I can save the file to disk, but what I'd like to do is give users the possibility to keep everything in memory. I can do this using

data = get_bytes_from_server()
D = nc.Dataset(filepath, memory=data)

but then D is read-only and can't be written to disk.

  • Is it possible to write the resulting file D to disk ?
  • Is is possible to open the in-memory file in append mode and eventually save the modified version to disk ?

Apparently the HDF5 library allows writing a memory file to disk: https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFaplCore

On the other hand, modifying an in-memory file is a feature still being developed: Unidata/netcdf-c#879

@shoyer
Copy link
Contributor

shoyer commented Jul 10, 2018

It might make sense to use a higher level abstraction like xarray for this. You could load your data in memory into an xarray.Dataset, and use xarray to modify the data and/or save it to disk as another netCDF file.

@jswhit
Copy link
Collaborator

jswhit commented Jul 13, 2018

There is a 'persist' kwarg to nc.Dataset which is supposed to write 'diskless' files to disk upon close. However, it doesn't appear to work when the file contents are taken from the memory kwarg. I would suggest asking on the netcdf-c list whether this is supported in the C lib, if it is we can make it work from python.

@huard
Copy link
Contributor Author

huard commented Jul 23, 2018

@jswhit This thread suggests that the read-only access data can be stored to a file. From what I've gathered, to modify an in-memory file, some development needs to happen on the HDF5 library (see Unidata/netcdf-c#708 and Unidata/netcdf-c#879). The good news it that there seems to be recent activity on this.

@shoyer Good idea, but xarray.open_dataset does not seem to support reading in-memory files out of the box, or at least, I can't find it.

@shoyer
Copy link
Contributor

shoyer commented Jul 23, 2018

This isn't well documented at present at xarray, but you can pass in an open netCDF4.Dataset object to xarray.backends.NetCDF4DataStore: pydata/xarray#1508

@huard
Copy link
Contributor Author

huard commented Jul 23, 2018

Ah yes, that works, but xarray is still using netCDF4-python behind the scene. So if I pass xarray a read-only nc.Dataset object and try to set an attribute through xarray, I still get a "Write to read-only" error. Sorry if I'm dense and missing something obvious.

@shoyer
Copy link
Contributor

shoyer commented Jul 23, 2018

@huard yes, that's correct.

If you're OK using netCDF3, scipy (also supported via xarray) can easily read/write netCDF files in memory. That's possibly the best option at this point.

@huard
Copy link
Contributor Author

huard commented Jul 23, 2018

Good to know, but we tend to rely a lot on the compression offered by netCDF4, so probably a no go. Thanks for the suggestions !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants