Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when opening Dataset #986

Closed
jblarsen opened this issue Dec 12, 2019 · 12 comments
Closed

Memory leak when opening Dataset #986

jblarsen opened this issue Dec 12, 2019 · 12 comments

Comments

@jblarsen
Copy link

jblarsen commented Dec 12, 2019

It seems like there is a memory leak when we open and close a dataset. A minimal example script triggering the issue on my computer can be seen below:

import gc
import netCDF4


def create_netcdf_file(filename):
    dataset = netCDF4.Dataset(filename, "w", format="NETCDF4_CLASSIC")
    dataset.createDimension("time", None)
    # The memory issue is only present if we create a variable
    dataset.createVariable("time", "f8", ("time", ))

    # Make sure the dataset is written to disk
    dataset.sync()


filename = 'test.nc'
create_netcdf_file(filename)

i = 0
while True:
    print(i)
    dataset = netCDF4.Dataset(filename, 'r')
    dataset.close()
    del dataset
    gc.collect()
    i += 1

The memory increase for this example is quite slow but we have seen much faster increases for realistic datasets.

I am using netCDF4 1.5.3 on an Ubuntu 18.04 machine

@jswhit2
Copy link
Contributor

jswhit2 commented Dec 12, 2019

Confirmed. Easier to see using the psutil module:

import netCDF4, psutil, os

filename = 'test.nc'
dataset = netCDF4.Dataset(filename, "w")
dataset.createDimension("time", 10)
# The memory issue is only present if we create a variable
v = dataset.createVariable("time", "f8", ("time", ))
dataset.close()

proc = psutil.Process(os.getpid())
i = 0
while True:
    dataset = netCDF4.Dataset(filename, 'r')
    dataset.close()
    mem = proc.memory_info().rss
    print("\t Loop: {}\t mem: {}".format(i, mem))
    i += 1

@jswhit2
Copy link
Contributor

jswhit2 commented Dec 12, 2019

memory seems to grow linearly with loop index at a rate of 1024 bytes per iteration (regardless of how large the variable is).

@jswhit2
Copy link
Contributor

jswhit2 commented Dec 12, 2019

I suspect this may be coming from the C library - will have to write a C program to open and close the dataset to be sure.

@jblarsen
Copy link
Author

The memory increase is by the way also happening for the MFDataset method. But the Dataset and MFDataset methods probably share a lot of code.

@jswhit
Copy link
Collaborator

jswhit commented Dec 15, 2019

Here's a c program that reproduces the memory leak for me. Running in a terminal and monitoring the RSS in top I see a linear increase pretty similar to the python program. Note that the memory usage does not increase with time if the format is changed from NC_NETCDF4 to NC_64BIT_OFFSET.

#include <netcdf.h>
#include <stdio.h>
int main() {
   int dataset_id,  time_id, dummyvar_id, ret, idx;
   size_t start[1] = {0};
   size_t count[1] = {100};
   double data[100];
   for (idx = 0; idx < 100; idx++) {
         data[idx]=-99;};
   ret=nc_create("test.nc", NC_CLOBBER | NC_NETCDF4, &dataset_id);
   if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
   ret=nc_def_dim(dataset_id, "time", NC_UNLIMITED, &time_id);
   if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
   ret=nc_def_var(dataset_id, "dummy", NC_DOUBLE, 1, &time_id, &dummyvar_id);
   if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
   ret=nc_put_vara(dataset_id, dummyvar_id, start, count, data);
   if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
   ret=nc_close(dataset_id);
   if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
   for (idx = 0; idx < 100000; idx++) {
       ret=nc_open("test.nc", NC_NOWRITE, &dataset_id);
       if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}
       ret=nc_close(dataset_id);
       if(ret!=NC_NOERR) {printf("%s\n",nc_strerror(ret)); return 1;}}
}

@jswhit
Copy link
Collaborator

jswhit commented Dec 15, 2019

Possibly related to Unidata/netcdf-c#1571

@DennisHeimbigner
Copy link
Collaborator

Let me try the C program using e.g. valgrind.

@jblarsen
Copy link
Author

@DennisHeimbigner did you have any luck running the C program with valgrind:-)

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Jan 10, 2020

I think this is being addressed in issue Unidata/netcdf-c#1575

@jblarsen
Copy link
Author

Thanks for the update Dennis:-)

@DennisHeimbigner
Copy link
Collaborator

What version of HDF5 was being used?

@jswhit
Copy link
Collaborator

jswhit commented Feb 26, 2020

Fixed by Unidata/netcdf-c#1634

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants