Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbled data when reading large cdf-5 files #453

Closed
4 of 12 tasks
pastewka opened this issue Aug 8, 2017 · 11 comments
Closed
4 of 12 tasks

Garbled data when reading large cdf-5 files #453

pastewka opened this issue Aug 8, 2017 · 11 comments

Comments

@pastewka
Copy link
Contributor

pastewka commented Aug 8, 2017

Please provide as much of the following information as you can, as applicable to the issue being reported. Naturally, not all information is relevant to every issue, but the more information we have to start, the better!

Environment Information

Feel free to skip this if the issue is related to documentation, a feature request, or general discussion.

  • What platform are you using? (please provide specific distribution/version in summary)
    • Linux - BlueGene/Q and CentOS 7
    • Windows
    • OSX
    • Other
    • NA
  • 32 and/or 64 bit?
    • 32-bit
    • 64-bit
  • What build system are you using?
    • autotools (configure)
    • cmake - CFLAGS: -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -g -Wall -Wconversion
  • Can you provide a sample netCDF file or C code to recreate the issue?
    • Yes (please attach to this issue, thank you!)
    • No - file too large
    • Not at this time

Summary of Issue

I am having a problem reading cdf-5 files that were written with parallel NetCDF 1.8.1 on an IBM BlueGene/Q. All unlimited frames except the first one contain garbled data. I believe this problem only appears when a single frame of an unlimited dimension becomes large. (Probably >2GB but have not systematically tested this.)

The example file under consideration has a size of 82G and 4 unlimited frames. Output of ncdump -h follows:

dimensions:
	frame = UNLIMITED ; // (4 currently)
	spatial = 3 ;
	Voigt = 6 ;
	atom = 388558757 ;
	cell_spatial = 3 ;
	cell_angular = 3 ;
	label = 10 ;
variables:
	char spatial(spatial) ;
	char cell_spatial(spatial) ;
	char cell_angular(spatial, label) ;
	double time(frame) ;
		time:units = "picosecond" ;
		time:scale_factor = 0.005 ;
	double cell_origin(frame, cell_spatial) ;
		cell_origin:units = "Angstrom" ;
		cell_origin:scale_factor = 1. ;
	double cell_lengths(frame, cell_spatial) ;
		cell_lengths:units = "Angstrom" ;
		cell_lengths:scale_factor = 1. ;
	double cell_angles(frame, cell_angular) ;
		cell_angles:units = "degree" ;
	int id(frame, atom) ;
	int type(frame, atom) ;
	double coordinates(frame, atom, spatial) ;
	double velocities(frame, atom, spatial) ;

// global attributes:
		:Conventions = "AMBER" ;
		:ConventionVersion = "1.0" ;
		:program = "LAMMPS" ;
		:programVersion = "16 Feb 2016" ;
}

Inspection for example the cell_angles variable with ncdump -v cell_angles yields:

 cell_angles =
  90, 90, 90,
  -8.82627132007748e+25, -9.87029617490031e+237, -3.51891053229443e-179,
  2.12199579145934e-314, 2.12199579145934e-314, 2.12199579145934e-314,
  2.40165011224084e+232, 1.86795957454656e-176, 1.24868215843219e-11 ;

The dump command from parallel NetCDF is able to read this file. ncmpidump -v cell_angles yields:

 cell_angles =
  90, 90, 90,
  90, 90, 90,
  90, 90, 90,
  90, 90, 90 ;

which is the correct information.

The error occurs in NetCDF 4.4.1.1 and latest github master.

@WardF
Copy link
Member

WardF commented Aug 8, 2017

This looks vaguely familiar; thanks for the bug report, we definitely want to correct this. Does this issue work if the file is generate serially? Or is that impractical? Perhaps it is impractical and it shouldn't matter, but I need to start narrowing things down. The fact that ncmpidump works is reassuring; the issue is in ncdump reading the data, and not how netcdf writes the data (a worst case scenario). Is it possible to put this large file somewhere where I can download it? I will start trying to diagnose this.

Oh yes, I said this looked vaguely familiar. I just released the 4.5.0-rc2 yesterday, and it contains a lot of fixes which have not made its way back into master yet. Can you try ncdump from this branch/release and see if the issue persists?

Thanks!

@pastewka
Copy link
Contributor Author

pastewka commented Aug 9, 2017

4.5.0-rc2 has the same problem. I'll upload the file and will post a link to it here later today.

@pastewka
Copy link
Contributor Author

pastewka commented Aug 9, 2017

I fixed it myself, was an easy one. A check for NC_64BIT_DATA was missing in v1hpg.c. Just issued pull request #457 that should fix this.

@WardF
Copy link
Member

WardF commented Aug 9, 2017

Thank you, that is a fantastic help, I will get it worked into the 4.5.0 release branch. If possible, can we still get the file? Or perhaps it would be easy to capture the output form 'ncdump -h [filename]' and attach it here? I'd like to characterize the file so that I can add a test to ensure this doesn't regress in the future.

@pastewka
Copy link
Contributor Author

Let me try to generate a file with a minimal size that shows this error, will get that one to you.

@WardF
Copy link
Member

WardF commented Aug 10, 2017

Thank you @pastewka ; I appreciate that, and certainly take your time.

@pastewka
Copy link
Contributor Author

You can download the example file here. File size is 20GB.

@DennisHeimbigner
Copy link
Collaborator

Given the fix, it should be possible to reengineer a
minimal test case.

@pastewka
Copy link
Contributor Author

pastewka commented Aug 11, 2017 via email

@WardF
Copy link
Member

WardF commented Aug 14, 2017

The fix has been merged into master, thank you!

@WardF WardF closed this as completed Aug 14, 2017
@pastewka
Copy link
Contributor Author

Perfect, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants