-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Granular BitGroom feature for netcdf-c #2130
Conversation
@czender do you have cites for the two papers your reference? |
@edwardhartnett If you mean the following two, then yes, and they are also on the CCR homepage: Delaunay, X., A. Courtois, and F. Gouillon (2019), Evaluation of Kouznetsov, R. (2021), A note on precision-preserving compression of scientific data, Geosci. Model Dev., 14(1), 377-389, https://doi.org/10.5194/gmd-14-377-2021 |
Is the number of algorithms likely to grow significantly, |
I believe not more than a few algorithms are envisioned, right @czender ? But I wonder why we don't just take the best algorithm and use that? Do we need the old one, once an improvement has come along? |
@DennisHeimbigner I do not intend to try to add anymore quantization algorithms. After BitGroom was published in 2016, it inspired others to optimize quantization algorithms even further, within the constraints of guaranteeing the user-specified NSD and keeping IEEE on-disk format so no decoder necessary. This GBG algorithm incorporates that progress, which significantly improves compression ratio (CR), something like 20% better than BG for NSD=3 (followed by DEFLATE). The "low-hanging fruit" are in GBG, so I think algorithmic improvements cannot improve GBG CR by more than 5% without violating the above constraints. I thought it important that netcdf-c not be limited to BG given the known improvements that were possible, so once Ed put in BG, it prompted me to develop and submit a "best of" algorithm to netcdf-c rather than see it languish in NCO or CCR. |
Clearing out the PR backlog, it appears that the quantize test is giving an 'unexpected error'. The autotools-based test is silent, I will adjust that to provide additional information in the case of failure, but the cmake test is at least a little bit more verbose about it. |
Thanks, @WardF. I see why the test fails and it's an easy fix. It fails because it expects an error if NC_QUANTIZE_BITGROOM is NOT the last quantize mode defined. Now NC_QUANTIZE_GRANULARBG is the last quantize mode defined, so the test that previously passed should now fail, as observed. Not sure why this was silently passing in the autoconf-based testing I did earlier, though. I think the simplest fix is to change the test so it tries to access an undefined quantize mode one greater than NC_QUANTIZE_GRANULARBG, and I will submit a patch for that soon. Another route would be to add another token, e.g., NC_QUANTIZE_MODE_MAX defined as the greatest valid enumerated value of QUANTIZE modes. I hesitate to do that because my sense is that adding tokens is frowned upon. |
…ANULARBG (instead of NC_QUANTIZE_BITGROOM) fails.
re: PR Unidata#2088 re: PR Unidata#2130 replaces: Unidata#2140 Changes: * Add NCZarr-specific quantize functions to the dispatch table. * Copy (modified) quantize code from libhdf5 to NCZarr * Add quantize invocation to zvar.c * Add support for _QuantizeBitgroomNumberOfSignificantDigits and _QuantizeGranularBitgroomNumberOfSignificantDigits to ncgen. * Modify nc_test4/tst_quantize.c to allow it to be used both for hdf5 and for nczarr. * Make dap4 properly handle quantize functions in dispatch table. * Add quantize attribute support to ncgen. Other changes: * Caught and fixed some S3 problems * Fixed some nczarr fillvalue problems. * Fixed some nczarr cache problems. * Cleanup some flaws in libdispatch/dinfermodel.c * Allow byterange requests to S3 be readable by dinfermodel.c/check_file_type * Remove the libnczarr ztracedispatch code (big change).
Granular BitGroom (GBG) combines features of BitGroom, BitRound by Kouznetsov (2020), and DigitRound by Delaunay et al. (2019). GBG improves compression ratios by ~20% relative to BitGroom for NSD=3 on our benchmark 1 GB climate model output dataset. Its invocation is identical to BitGroom, so this patchset mainly utilizes a new enumerated value of the quantize_mode flag to invoke the new algorithm. No tests (yet) in this patchset. For correctness, GBG can be compared to current implementations in NCO and CCR.