1 # Lossy Compression with Quantize
3 ## Introduction {#quantize}
5 The quantize feature was initially developed as part of the Community
6 Codec Repository (CCR) [2]. The CCR project allows netCDF users to
7 make use of HDF5 plugins (a.k.a. “filters”) which can add new
8 compression and other algorithms to the HDF5 library. As part of CCR,
9 the quantization algorithms were implemented as HDF5 filters.
11 However, one aspect of implementing the quantization as a filter is
12 that the filter is also required when reading the data [1]. Although
13 this makes sense for compression/decompression algorithms, the
14 quantize algorithms are only needed when data are written. Requiring
15 that the readers of the data also install the filters places an
16 unnecessary burden on data readers. Furthermore, using the quantize
17 filter results in data that cannot be read by netCDF-Java or versions
18 of netcdf-c before 4.8.0, when support for multiple HDF5 filters was
19 added. For these reasons, it was decided to merge the quantize
20 algorithms into the netcdf-c library [5].
22 As part of the netcdf-c library, the quantize algorithms are available
23 for netCDF/HDF5 files, and the new ncZarr format, and produce data
24 files that are fully backward compatible for all versions of netcdf-c
25 since 4.0, and also fully compatible with netcdf-Java.
27 ## The Quantize Feature
29 The quantize algorithms assist with lossy compression by setting
30 excess bits to all zeros or all ones (in alternate array values). This
31 allows a subsequent compression algorithm, like the zlib-based
32 deflation built into netCDF-4, to better compress the data.
34 The quantize feature is applied to a variable in a netCDF file, and
35 may only be used with single or double precision floating point
36 (netCDF types NC_FLOAT and NC_DOUBLE). Attempting to turn on quantize
37 for any other type of netCDF variable will result in an error.
39 It should be noted that turning on quantize does not, by itself,
40 reduce the size of the data. Only if subsequent compression is used
41 will setting the quantize feature result in additional compression.
43 
44 Figure 1: The value of Pi expressed as a 32-bit floating point number,
45 with different levels of quantization applied, from Number of
46 Significant Digits equal to 8 (no quantization), to 1 (maximum
47 quantization). The least significant bits of the significand are
48 replaced with zeros, to the extent possible, while preserving the
49 desired number of significant digits. In this example the Bit Grooming
50 quantization algorithm is used.
52 ## Quantization Algorithms
54 Three different quantization algorithms are provided in the netcdf-c
55 quantize feature. Each does a somewhat different calculation to
56 determine the number of bits that can be set to zeros (or ones), while
57 preserving the number of significant digits specified by the user.
59 Two of the algorithms, Bit-Groom and Granular Bit-Round, accept the
60 number of decimal digits to be preserved in the data. One algorithm,
61 Bit-Round, accepts the number of binary bits to preserve.
65 The Bit Grooming algorithm sets determines the number of bits which
66 are necessary for the required number of significant decimal
67 digits. This determination is made at the beginning of processing and
68 is applied to all values.
70 Bit Grooming then sets excess bits of the first array value to zero,
71 then excess bits of the next array value to one, and continues
72 alternating between zero and one for the excess bits of every other
73 array value. In this way, the average value of the array is preserved.
75 For the Bit Grooming algorithm, the NSD parameter refers to the number
76 of significant decimal digits that will be preserved. The number of
77 significant digits may be 1-7 for single precision floating point, or
78 1-15 for double precision floating point.
80 ### Granular Bit Round
82 Granular Bit Round determines the number of required bits for each
83 value in the array, and uses IEEE rounding to change the data
84 value. It achieves a better overall compression ratio by more
85 aggressively determining the minimum number of bits required to
86 preserve the specified number of decimal digit precision.
88 For the Granular Bit Round algorithm, the NSD parameter refers to the
89 number of significant decimal digits that will be preserved (as with
90 the Bit Grooming algorithm). The number of significant digits may be
91 1-7 for single precision floating point, or 1-15 for double precision
96 The Bit Round algorithm allows the user to directly specify the number
97 of bits of the significant which will be preserved, and then sets
98 excess bits to zero or one for alternate array values.
100 For the Bit Round algorithm, the NSD parameter refers to the number of
101 significant binary digits that will be preserved. The number of
102 significant digits may be 1-23 for single precision floating point, or
103 1-52 for double precision floating point.
105 ## Quantize Attribute
107 When the quantize feature is used, an integer attribute is added to
108 the variable which contains the NSD setting. Without this attribute it
109 would be impossible for readers to know that quantize had been applied
110 to the data. The name of the attribute reflects the quantize algorithm
111 used. In accordance with the conventions established by the NetCDF
112 Users Guide, these attribute names begin with an underscore,
113 indicating that they are added by the library and should not be
114 modified or deleted by users [6].
116 Algorithm | Attribute Name
117 ----------|---------------
118 Bit Groom | _QuantizeBitGroomNumberOfSignificantDigits
119 Granular Bit Round | _QuantizeGranularBitRoundNumberOfSignificantDigits
120 Bit Round | _QuantizeBitRoundNumberOfSignificantBits
122 Figure 2: Table showing the names of the attribute added to a variable
123 after the quantize feature has been applied. The name of the attribute
124 indicates the algorithm used, the integer values represents the number
125 of significant decimal digits (for Bit Groom and Granular Bit Round),
126 or the number of significand bits retained (for Bit Round).
128 ## Handling of Fill Values
130 In a netCDF file, fill values refer to the value used for elements of
131 the data not written by the user. For example, if a variable contains
132 an array of 10 values, and the user only writes 8 of them, the other
133 two values will be set to the fill value for that variable.
135 The fill value of a variable may be set by the user by adding an
136 attribute of the same type as the variable with the name
137 “_FillValue”. If present, the value of this attribute will be used as
138 the fill value for that variable. If not specified, a default value
139 for each type is used as the fill value. The default fill values may
140 be found in the netcdf.h file.
142 When using the quantize feature, any fill values will remain
143 unquantized. That is, the excess bits of any array element will not be
144 changed, if that element is the fill value. This is necessary if the
145 fill value is to retain its purpose as an indicator of values that
146 have not been written.
148 ## Distortions Introduced by Lossy Compression
150 Any lossy compression introduces distortions to data.
152 The Bitgroom algorithms implemented in netcdf-c introduce a distortoin
153 that can be quantified in terms of a _relative_ error. The magnitude
154 of distortion introduced to every single value V is guaranteed to be
155 within a certain fraction of V, expressed as 0.5 * V * 2**{-NSB}:
156 i.e. it is 0.5V for NSB=0, 0.25V for NSB=1, 0.125V for NSB=2 etc.
158 Two quantize algorithms use different definitions of _decimal
159 precision_, though both are guaranteed to reproduce NSD decimals when
162 The margin for a relative error introduced by the methods are
163 summarised in the table:
169 Error Margin 3.1e-2 3.9e-3 4.9e-4 3.1e-5 3.8e-6 4.7e-7 -
172 Error Margin 1.4e-1 1.9e-2 2.2e-3 1.4e-4 1.8e-5 2.2e-6 -
176 If one defines decimal precision as in BitGroom, i.e. the introduced
177 relative error must not exceed half of the unit at the decimal place
178 NSD in the worst-case scenario, the following values of NSB should be
183 NSB 3 6 9 13 16 19 23
186 The resulting application of BitRound is as fast as BitGroom, and is
187 free from artifacts in multipoint statistics introduced by BitGroom
188 (see https://doi.org/10.5194/gmd-14-377-2021).
190 ## Using the Quantize Feature
192 Turning on the quantize feature must be done on a per-variable basis,
193 after the variable has been defined, and before nc_enddef() (or its
194 Fortran equivalents) have been called. (Recall that for netCDF/HDF5
195 files, nc_enddef() is automatically called when data are written or
196 read from a variable.)
198 In accordance with the usual NetCDF API practice, an inquiry function
199 is also provided which may be called to check if quantize has been
200 turned on for a variable. Calling the inquiry function is not required
201 when reading the data - it is provided for user convenience.
203 ### Using Quantize with the NetCDF C API
205 Quantize is available in the main branch of the netcdf-c library, and
206 will be part of the next release (netcdf-c-4.9.0).
208 To turn on the quantize feature, call the nc_def_var_quantize()
209 function. To inquire about whether quantize been turned on for a
210 variable, use the nc_inq_var_quantize() function.
213 /* Create two variables, one float, one double. Quantization
214 * may only be applied to floating point data. */
215 if (nc_def_var(ncid, "var1", NC_FLOAT, NDIM1, &dimid, &varid1)) ERR;
216 if (nc_def_var(ncid, "var2", NC_DOUBLE, NDIM1, &dimid, &varid2)) ERR;
218 /* Set up quantization. This will not make the data any
219 * smaller, unless compression is also turned on. In this
220 * case, we will set 3 significant digits. */
221 if (nc_def_var_quantize(ncid, varid1, NC_QUANTIZE_BITGROOM, NSD_3)) ERR;
222 if (nc_def_var_quantize(ncid, varid2, NC_QUANTIZE_BITGROOM, NSD_3)) ERR;
224 /* Set up zlib compression. This will work better because the
225 * data are quantized, yielding a smaller output file. We will
226 * set compression level to 1, which is usually the best
228 if (nc_def_var_deflate(ncid, varid1, 0, 1, 1)) ERR;
229 if (nc_def_var_deflate(ncid, varid2, 0, 1, 1)) ERR;
232 Figure 3: Example of using the quantize feature in C. Note that the
233 example also demonstrates adding zlib (a.k.a. deflate) compression to
234 the variables. Without turning on the compression, use of quantize
235 alone will not result in smaller data output.
237 ### Using Quantize with the NetCDF Fortran 90 API
239 Quantize is available on a branch of the netcdf-fortran libraries, and
240 will be merged to main after the next netcdf-c release (4.9.0) and
241 will be released as part of the netCDF Fortran 90 API in the
242 subsequent release of netcdf-fortran.
244 In the Fortran 90 API, quantization is turned on by using two new
245 optional arguments to nf90_def_var(), the quantize_mode and the nsd
249 ! Define some variables.
250 call check(nf90_def_var(ncid, VAR1_NAME, NF90_FLOAT, dimids, varid1&
251 &, deflate_level = DEFLATE_LEVEL, quantize_mode =&
252 & nf90_quantize_bitgroom, nsd = 3))
253 call check(nf90_def_var(ncid, VAR2_NAME, NF90_DOUBLE, dimids,&
254 & varid2, contiguous = .TRUE., quantize_mode =&
255 & nf90_quantize_bitgroom, nsd = 3))
258 Figure 4: In the Fortran 90 netCDF API, two additional optional
259 parameters are available for the quantize feature, the quantize_mode
262 ### Using Quantize with the NetCDF Fortran 77 API
264 Quantize is available on a branch of the netcdf-fortran libraries, and
265 will be merged to main after the next netcdf-c release (4.9.0) and
266 will be released as part of the netCDF Fortran 77 API in the
267 subsequent release of netcdf-fortran.
270 C Create some variables.
272 retval = nf_def_var(ncid, var_name(x), var_type(x), NDIM1,
274 if (retval .ne. nf_noerr) stop 3
277 retval = nf_def_var_quantize(ncid, varid(x),
278 $ NF_QUANTIZE_BITGROOM, NSD_3)
279 if (retval .ne. nf_noerr) stop 3
281 C Turn on zlib compression.
282 retval = nf_def_var_deflate(ncid, varid(x), 0, 1, 1)
283 if (retval .ne. nf_noerr) stop 3
287 Figure 4: In the Fortran 77 netCDF API, nf_def_var_quantize() and
288 nf_inq_var_quantize() are provided, which wrap the quantize functions
293 
295 Figure 5: Compression ratio of E3SM Atmosphere Model (EAM) v2 default
296 monthly dataset of raw size 445 MB compressed with default netCDF
297 lossless compression algorithm (DEFLATE, compression level=1) alone
298 (leftmost), or after pre-filtering with one of three lossy codecs
299 (BitGroom, Granular BitGroom, or BitRound) with quantization increasing
300 (and precision decreasing) to the right.
304 1. HDF5 Dynamically Loaded Filters, The HDF Group, retrieved on
305 December 2, 2021 from
306 https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf.
308 2. Hartnett, Zender, C. S., (2020), ADDITIONAL NETCDF COMPRESSION
309 OPTIONS WITH THE COMMUNITY CODEC REPOSITORY (CCR), American
310 Meteorological Society (AMS) Annual Meeting, retrieved on July 3, 2021
312 https://www.researchgate.net/publication/347726695_ADDITIONAL_NETCDF_COMPRESSION_OPTIONS_WITH_THE_COMMUNITY_CODEC_REPOSITORY_CCR.
314 3. Zender, C. S. (2016), Bit Grooming: Statistically accurate
315 precision-preserving quantization with compression, evaluated in the
316 netCDF Operators (NCO, v4.4.8+), Geosci. Model Dev., 9, 3199-3211,
317 doi:10.5194/gmd-9-3199-2016 Retrieved on Sep 21, 2020 from
318 https://www.researchgate.net/publication/301575383_Bit_Grooming_Statistically_accurate_precision-preserving_quantization_with_compression_evaluated_in_the_netCDF_Operators_NCO_v448.
320 4. Delaunay, X., A. Courtois, and F. Gouillon (2019), Evaluation of
321 lossless and lossy algorithms for the compression of scientific
322 datasets in netCDF-4 or HDF5 files, Geosci. Model Dev., 12(9),
323 4099-4113, doi:10.5194/gmd-2018-250, retrieved on Sep 21, 2020 from
324 https://www.researchgate.net/publication/335987647_Evaluation_of_lossless_and_lossy_algorithms_for_the_compression_of_scientific_datasets_in_netCDF-4_or_HDF5_files.
326 5. Hartnett, E., et. al., “Provide a way to do bit grooming before
327 compression”, netcdf-c GitHub Issue #1548,
328 https://github.com/Unidata/netcdf-c/issues/1548.
330 6. Rew, R., et. al., NetCDF Users Guide, Appendix A: Attribute
331 Conventions, Unidata,
332 https://docs.unidata.ucar.edu/netcdf-c/current/attribute_conventions.html.