NetCDF  4.9.2
filters.md
1 Appendix D. NetCDF-4 Filter Support {#filters}
2 ==================================
3 
4 [TOC]
5 
6 > See @ref nc_filters_quickstart for tips to get started quickly with NetCDF-4 Filter Support.
7 
8 ## Filters Overview {#filters_overview}
9 
10 NetCDF-C filters have some features of which the user
11 should be aware.
12 
13 * ***Auto Install of filters***<br>
14 An option is now provided to automatically install
15 HDF5 filters into a default location, or optionally
16 into a user-specified location. This is described in
17 [Appendix H](#filters_appendixh)
18 (with supporting information in [Appendix G](#filters_appendixg)).
19 
20 * ***NCZarr Filter Support***<br>
21 [NCZarr filters](#filters_nczarr) are now supported.
22 This essentially means that it is possible to specify
23 Zarr Codecs (Zarr equivalent of filters) in Zarr files
24 and have them processed using HDF5-style wrapper shared libraries.
25 Zarr filters can be used even if HDF5 support is disabled
26 in the netCDF-C library.
27 
28 ## Introduction to Filters {#filters_introduction}
29 
30 The netCDF library supports a general filter mechanism to apply
31 various kinds of filters to datasets before reading or writing.
32 The most common kind of filter is a compression-decompression
33 filter, and that is the focus of this document.
34 But non-compression filters &ndash; fletcher32, for example &ndash; also exist.
35 
36 The netCDF enhanced (aka netCDF-4) library inherits this
37 capability since it depends on the HDF5 library. The HDF5
38 library (1.8.11 and later) supports filters, and netCDF is based
39 closely on that underlying HDF5 mechanism.
40 
41 Filters assume that a variable has chunking defined and each
42 chunk is filtered before writing and "unfiltered" after reading
43 and before passing the data to the user. In the event that
44 multiple filters are defined on a variable, they are applied in
45 first-defined order on writing and on the reverse order when
46 reading.
47 
48 This document describes the support for HDF5 filters and also
49 the newly added support for NCZarr filters.
50 
51 ## A Warning on Backward Compatibility {#filters_compatibility}
52 
53 The API defined in this document should accurately reflect the
54 current state of filters in the netCDF-c library. Be aware that
55 there was a short period in which the filter code was undergoing
56 some revision and extension. Those extensions have largely been
57 reverted. Unfortunately, some users may experience some
58 compilation problems for previously working code because of
59 these reversions. In that case, please revise your code to
60 adhere to this document. Apologies are extended for any
61 inconvenience.
62 
63 A user may encounter an incompatibility if any of the following appears in user code.
64 
65 * The function *\_nc\_inq\_var\_filter* was returning the error value NC\_ENOFILTER if a variable had no associated filters.
66  It has been reverted to the previous case where it returns NC\_NOERR and the returned filter id was set to zero if the variable had no filters.
67 * The function *nc\_inq\_var\_filterids* was renamed to *nc\_inq\_var\_filter\_ids*.
68 * Some auxilliary functions for parsing textual filter specifications have been moved to the file *netcdf\_aux.h*. See [Appendix A](#filters_appendixa).
69 * All of the "filterx" functions have been removed. This is unlikely to cause problems because they had limited visibility.
70 
71 For additional information, see [Appendix B](#filters_appendixb).
72 
73 ## Enabling A HDF5 Compression Filter {#filters_enable}
74 
75 HDF5 supports dynamic loading of compression filters using the
76 following process for reading of compressed data.
77 
78 1. Assume that we have a dataset with one or more variables that were compressed using some algorithm.
79  How the dataset was compressed will be discussed subsequently.
80 2. Shared libraries or DLLs exist that implement the compress/decompress algorithm.
81  These libraries have a specific API so that the HDF5 library can locate, load, and utilize the compressor.
82 3. These libraries are expected to installed in a specific directory.
83 
84 In order to compress a variable with an HDF5 compliant filter,
85 the netcdf-c library must be given three pieces of information:
86 
87 1. some unique identifier for the filter to be used,
88 2. a vector of parameters for controlling the action of the compression filter, and
89 3. access to a shared library implementation of the filter.
90 
91 The meaning of the parameters is, of course, completely filter
92 dependent and the filter description [3] needs to be consulted.
93 For bzip2, for example, a single parameter is provided
94 representing the compression level. It is legal to provide a
95 zero-length set of parameters. Defaults are not provided, so
96 this assumes that the filter can operate with zero parameters.
97 
98 Filter ids are assigned by the HDF group. See [4] for a current
99 list of assigned filter ids. Note that ids above 32767 can be
100 used for testing without registration.
101 
102 The first two pieces of information can be provided in one of
103 three ways: (1) using *ncgen*, (2) via an API call, or (3) via
104 command line parameters to *nccopy*. In any case, remember that
105 filtering also requires setting chunking, so the variable must
106 also be marked with chunking information. If compression is set
107 for a non-chunked variable, the variable will forcibly be
108 converted to chunked using a default chunking algorithm.
109 
110 ## Using The API {#filters_API}
111 The necessary API methods are included in *netcdf\_filter.h* by default.
112 These functions implicitly use the HDF5 mechanisms and may produce an error if applied to a file format that is not compatible with the HDF5 mechanism.
113 
114 ### nc\_def\_var\_filter
115 Add a filter to the set of filters to be used when writing a variable. This must be invoked after the variable has been created and before *nc\_enddef* is invoked.
116 ````
117  int nc_def_var_filter(int ncid, int varid, unsigned int id,
118  size_t nparams, const unsigned int* params);
119 ````
120 Arguments:
121 
122 * ncid &mdash; File and group ID.
123 * varid &mdash; Variable ID.
124 * id &mdash; Filter ID.
125 * nparams &mdash; Number of filter parameters.
126 * params &mdash; Filter parameters (a vector of unsigned integers)
127 
128 Return codes:
129 
130 * NC\_NOERR &mdash; No error.
131 * NC\_ENOTNC4 &mdash; Not a netCDF-4 file.
132 * NC\_EBADID &mdash; Bad ncid or bad filter id
133 * NC\_ENOTVAR &mdash; Invalid variable ID.
134 * NC\_EINDEFINE &mdash; called when not in define mode
135 * NC\_ELATEDEF &mdash; called after variable was created
136 * NC\_EINVAL &mdash; Scalar variable, or parallel enabled and parallel filters not supported or nparams or params invalid.
137 
138 ### nc\_inq\_var\_filter\_ids
139 Query a variable to obtain a list of the ids of all filters associated with that variable.
140 ````
141 int nc_inq_var_filter_ids(int ncid, int varid, size_t* nfiltersp, unsigned int* filterids);
142 ````
143 Arguments:
144 
145 * ncid &mdash; File and group ID.
146 * varid &mdash; Variable ID.
147 * nfiltersp &mdash; Stores number of filters found; may be zero.
148 * filterids &mdash; Stores set of filter ids.
149 
150 Return codes:
151 
152 * NC\_NOERR &mdash; No error.
153 * NC\_ENOTNC4 &mdash; Not a netCDF-4 file.
154 * NC\_EBADID &mdash; Bad ncid
155 * NC\_ENOTVAR &mdash; Invalid variable ID.
156 
157 The number of filters associated with the variable is stored in *nfiltersp* (it may be zero).
158 The set of filter ids will be returned in *filterids*.
159 As is usual with the netcdf API, one is expected to call this function twice.
160 The first time to set *nfiltersp* and the second to get the filter ids in client-allocated memory.
161 Any of these arguments can be NULL, in which case no value is returned.
162 
163 ### nc\_inq\_var\_filter\_info
164 Query a variable to obtain information about a specific filter associated with the variable.
165 ````
166 int nc_inq_var_filter_info(int ncid, int varid, unsigned int id, size_t* nparamsp, unsigned int* params);
167 ````
168 Arguments:
169 
170 * ncid &mdash; File and group ID.
171 * varid &mdash; Variable ID.
172 * id &mdash; The filter id of interest.
173 * nparamsp &mdash; Stores number of parameters.
174 * params &mdash; Stores set of filter parameters.
175 
176 Return codes:
177 
178 * NC\_NOERR &mdash; No error.
179 * NC\_ENOTNC4 &mdash; Not a netCDF-4 file.
180 * NC\_EBADID &mdash; Bad ncid
181 * NC\_ENOTVAR &mdash; Invalid variable ID.
182 * NC\_ENOFILTER &mdash; Filter not defined for the variable.
183 
184 The *id* indicates the filter of interest.
185 The actual parameters are stored in *params*.
186 The number of parameters is returned in *nparamsp*.
187 As is usual with the netcdf API, one is expected to call this function twice.
188 The first time to set *nparamsp* and the second to get the parameters in client-allocated memory.
189 Any of these arguments can be NULL, in which case no value is returned.
190 If the specified id is not attached to the variable, then NC\_ENOFILTER is returned.
191 
192 ### nc\_inq\_var\_filter
193 Query a variable to obtain information about the first filter associated with the variable.
194 When netcdf-c was modified to support multiple filters per variable, the utility of this function became redundant since it returns info only about the first defined filter for the variable.
195 Internally, it is implemented using the functions *nc\_inq\_var\_filter\_ids* and *nc\_inq\_filter\_info*.
196 
197 ````
198 int nc_inq_var_filter(int ncid, int varid, unsigned int* idp, size_t* nparamsp, unsigned int* params);
199 ````
200 
201 Arguments:
202 
203 * ncid &mdash; File and group ID.
204 * varid &mdash; Variable ID.
205 * idp &mdash; Stores the id of the first found filter, set to zero if variable has no filters.
206 * nparamsp &mdash; Stores number of parameters.
207 * params &mdash; Stores set of filter parameters.
208 
209 Return codes:
210 
211 * NC\_NOERR &mdash; No error.
212 * NC\_ENOTNC4 &mdash; Not a netCDF-4 file.
213 * NC\_EBADID &mdash; Bad ncid
214 * NC\_ENOTVAR &mdash; Invalid variable ID.
215 
216 The filter id will be returned in the *idp* argument.
217 If there are no filters, then zero is stored in this argument.
218 Otherwise, the number of parameters is stored in *nparamsp* and the actual parameters in *params*.
219 As is usual with the netcdf API, one is expected to call this function twice.
220 The first time to get *nparamsp* and the second to get the parameters in client-allocated memory.
221 Any of these arguments can be NULL, in which case no value is returned.
222 
223 ## Using ncgen {#filters_NCGEN}
224 
225 In a CDL file, compression of a variable can be specified by annotating it with the following attribute:
226 
227 * *\_Filter* &mdash; a string containing a comma separated list of constants specifying (1) the filter id to apply, and (2) a vector of constants representing the parameters for controlling the operation of the specified filter.
228 See the section on the <a href="#filters_syntax">parameter encoding syntax</a> for the details on the allowable kinds of constants.
229 
230 This is a "special" attribute, which means that it will normally be invisible when using *ncdump* unless the -s flag is specified.
231 
232 For backward compatibility it is probably better to use the *\_Deflate* attribute instead of *\_Filter*. But using *\_Filter* to specify deflation will work.
233 
234 Multiple filters can be specified for a given variable by using the "|" separator.
235 Alternatively, this attribute may be repeated to specify multiple filters.
236 
237 Note that the lexical order of declaration is important when more than one filter is specified for a variable because it determines the order in which the filters are applied.
238 
239 ### Example CDL File (Data elided)
240 
241 ````
242 netcdf bzip2szip {
243 dimensions:
244  dim0 = 4 ; dim1 = 4 ; dim2 = 4 ; dim3 = 4 ;
245 variables:
246  float var(dim0, dim1, dim2, dim3) ;
247  var:_Filter = "307,9|4,32,32" ; // bzip2 then szip
248  var:_Storage = "chunked" ;
249  var:_ChunkSizes = 4, 4, 4, 4 ;
250 data:
251 ...
252 }
253 ````
254 
255 Note that the assigned filter id for bzip2 is 307 and for szip it is 4.
256 
257 ## Using nccopy {#filters_NCCOPY}
258 
259 When copying a netcdf file using *nccopy* it is possible to specify filter information for any output variable by using the "-F" option on the command line; for example:
260 
261  nccopy -F "var,307,9" unfiltered.nc filtered.nc
262 
263 Assume that *unfiltered.nc* has a chunked but not bzip2 compressed variable named "var".
264 This command will copy that variable to the *filtered.nc* output file but using filter with id 307 (i.e. bzip2) and with parameter(s) 9 indicating the compression level.
265 See the section on the <a href="#filters_syntax">parameter encoding syntax</a> for the details on the allowable kinds of constants.
266 
267 The "-F" option can be used repeatedly, as long as a different variable is specified for each occurrence.
268 
269 It can be convenient to specify that the same compression is to be applied to more than one variable. To support this, two additional *-F* cases are defined.
270 
271 1. *-F \*,...* means apply the filter to all variables in the dataset.
272 2. *-F v1&v2&..,...* means apply the filter to multiple variables.
273 
274 Multiple filters can be specified using the pipeline notions '|'.
275 For example
276 
277 1. *-F v1&v2,307,9|4,32,32* means apply filter 307 (bzip2) then filter 4 (szip) to the multiple variables.
278 
279 Note that the characters '\*', '\&', and '\|' are shell reserved characters, so you will probably need to escape or quote the filter spec in that environment.
280 
281 As a rule, any input filter on an input variable will be applied to the equivalent output variable &mdash; assuming the output file type is netcdf-4.
282 It is, however, sometimes convenient to suppress output compression either totally or on a per-variable basis.
283 Total suppression of output filters can be accomplished by specifying a special case of "-F", namely this.
284 
285  nccopy -F none input.nc output.nc
286 
287 The expression *-F \*,none* is equivalent to *-F none*.
288 
289 Suppression of output filtering for a specific set of variables can be accomplished using these formats.
290 
291  nccopy -F "var,none" input.nc output.nc
292  nccopy -F "v1&v2&...,none" input.nc output.nc
293 
294 where "var" and the "vi" are the fully qualified name of a variable.
295 
296 The rules for all possible cases of the "-F none" flag are defined by this table.
297 <table>
298 <tr><th>-F none<th>-Fvar,...<th>Input Filter<th>Applied Output Filter
299 <tr><td>true<td>undefined<td>NA<td>unfiltered
300 <tr><td>true<td>none<td>NA<td>unfiltered
301 <tr><td>true<td>defined<td>NA<td>use output filter(s)
302 <tr><td>false<td>undefined<td>defined<td>use input filter(s)
303 <tr><td>false<td>none<td>NA<td>unfiltered
304 <tr><td>false<td>defined<td>undefined<td>use output filter(s)
305 <tr><td>false<td>undefined<td>undefined<td>unfiltered
306 <tr><td>false<td>defined<td>defined<td>use output filter(s)
307 </table>
308 
309 ## Filter Specification Syntax {#filters_syntax}
310 
311 The utilities <a href="#NCGEN">ncgen</a> and <a href="#NCCOPY">nccopy</a>, and also the output of *ncdump*, support the specification of filter ids, formats, and parameters in text format.
312 The BNF specification is defined in [Appendix C](#filters_appendixc).
313 Basically, These specifications consist of a filter id, a comma, and then a sequence of
314 comma separated constants representing the parameters.
315 The constants are converted within the utility to a proper set of unsigned int constants (see the <a href="#ParamEncode">parameter encoding section</a>).
316 
317 To simplify things, various kinds of constants can be specified rather than just simple unsigned integers.
318 The *ncgen* and *nccopy* programs will encode them properly using the rules specified in the section on <a href="#filters_paramcoding">parameter encode/decode</a>.
319 Since the original types are lost after encoding, *ncdump* will always show a simple list of unsigned integer constants.
320 
321 The currently supported constants are as follows.
322 <table>
323 <tr halign="center"><th>Example<th>Type<th>Format Tag<th>Notes
324 <tr><td>-17b<td>signed 8-bit byte<td>b|B<td>Truncated to 8 bits and sign extended to 32 bits
325 <tr><td>23ub<td>unsigned 8-bit byte<td>u|U b|B<td>Truncated to 8 bits and zero extended to 32 bits
326 <tr><td>-25S<td>signed 16-bit short<td>s|S<td>Truncated to 16 bits and sign extended to 32 bits
327 <tr><td>27US<td>unsigned 16-bit short<td>u|U s|S<td>Truncated to 16 bits and zero extended to 32 bits
328 <tr><td>-77<td>implicit signed 32-bit integer<td>Leading minus sign and no tag<td>
329 <tr><td>77<td>implicit unsigned 32-bit integer<td>No tag<td>
330 <tr><td>93U<td>explicit unsigned 32-bit integer<td>u|U<td>
331 <tr><td>789f<td>32-bit float<td>f|F<td>
332 <tr><td>12345678.12345678d<td>64-bit double<td>d|D<td>LE encoding
333 <tr><td>-9223372036854775807L<td>64-bit signed long long<td>l|L<td>LE encoding
334 <tr><td>18446744073709551615UL<td>64-bit unsigned long long<td>u|U l|L<td>LE encoding
335 </table>
336 Some things to note.
337 
338 1. In all cases, except for an untagged positive integer, the format tag is required and determines how the constant is converted to one or two unsigned int values.
339 2. For an untagged positive integer, the constant is treated as of the smallest type into which it fits (i.e. 8,16,32, or 64 bit).
340 3. For signed byte and short, the value is sign extended to 32 bits and then treated as an unsigned int value, but maintaining the bit-pattern.
341 4. For double, and signed|unsigned long long, they are converted as specified in the section on <a href="#filters_paramcoding">parameter encode/decode</a>.
342 5. In order to support mutiple filters, the argument to *\_Filter* may be a pipeline separated (using '|') to specify a list of filters specs.
343 
344 ## Dynamic Loading Process {#filters_Process}
345 
346 Each filter is assumed to be compiled into a separate dynamically loaded library.
347 For HDF5 conformant filters, these filter libraries are assumed to be in some specific location.
348 The details for writing such a filter are defined in the HDF5 documentation[1,2].
349 
350 ### Plugin directory {#filters_plugindir}
351 
352 The HDF5 loader searches for plugins in a number of directories.
353 This search is contingent on the presence or absence of the environment
354 variable named ***HDF5_PLUGIN_PATH***.
355 
356 As with all other "...PATH" variables, it is a sequence of absolute
357 directories separated by a separator character. For *nix* operating systems,
358 this separator is the colon (':') character. For Windows and Mingw, the
359 separator is the semi-colon (';') character. So for example:
360 
361 * Linux: export HDF5_PLUGIN_PATH=/usr/lib:/usr/local/lib
362 * Windows: export HDF5_PLUGIN_PATH=c:\\ProgramData\\hdf5\\plugin;c:\\tools\\lib
363 
364 If HDF5_PLUGIN_PATH is defined, then the loader will search each directory
365 in the path from left to right looking for shared libraries with specific
366 exported symbols representing the entry points into the library.
367 
368 If HDF5_PLUGIN_PATH is not defined, the loader defaults to using
369 these default directories:
370 
371 * Linux: /usr/local/hdf5/lib/plugin
372 * Windows: %ALLUSERSPROFILE%\\hdf5\\lib\\plugin
373 
374 It should be noted that there is a difference between the search order
375 for HDF5 versus NCZarr. The HDF5 loader will search only the directories
376 specificed in HDF5_PLUGIN_PATH. In NCZarr, the loader
377 searches HDF5_PLUGIN_PATH and as a last resort,
378 it also searches the default directory.
379 
380 ### Plugin Library Naming {#filters_Pluginlib}
381 
382 Given a plugin directory, HDF5 examines every file in that directory
383 that conforms to a specified name pattern as determined by the
384 platform on which the library is being executed.
385 
386 <table>
387 <tr halign="center"><th>Platform<th>Basename<th>Extension
388 <tr halign="left"><td>Linux<td>lib*<td>.so*
389 <tr halign="left"><td>OSX<td>lib*<td>.dylib*
390 <tr halign="left"><td>Cygwin<td>cyg*<td>.dll*
391 <tr halign="left"><td>Windows<td>*<td>.dll
392 </table>
393 
394 ### Plugin Verification {#filters_Pluginverify}
395 
396 For each dynamic library located using the previous patterns,
397 HDF5 attempts to load the library and attempts to obtain
398 information from it. Specifically, It looks for two functions
399 with the following signatures.
400 
401 1. *H5PL\_type\_t H5PLget\_plugin\_type(void)* &mdash; This function is expected to return the constant value *H5PL\_TYPE\_FILTER* to indicate that this is a filter library.
402 2. *const void* H5PLget\_plugin\_info(void)* &mdash; This function returns a pointer to a table of type *H5Z\_class2\_t*.
403  This table contains the necessary information needed to utilize the filter both for reading and for writing.
404  In particular, it specifies the filter id implemented by the library and it must match that id specified for the variable in *nc\_def\_var\_filter* in order to be used.
405 
406 If plugin verification fails, then that plugin is ignored and the search continues for another, matching plugin.
407 
408 ## NCZarr Filter Support {#filters_nczarr}
409 
410 The inclusion of Zarr support in the netcdf-c library creates the need to provide a new representation consistent with the way that Zarr files store filter information.
411 For Zarr, filters are represented using the JSON notation.
412 Each filter is defined by a JSON dictionary, and each such filter dictionary
413 is guaranteed to have a key named "id" whose value is a unique string defining the filter algorithm: "lz4" or "bzip2", for example.
414 
415 The parameters of the filter are defined by additional &mdash; algorithm specific &mdash; keys in the filter dictionary.
416 One commonly used filter is "blosc", which has a JSON dictionary of this form.
417 ````
418  {
419  "id": "blosc",
420  "cname": "lz4",
421  "clevel": 5,
422  "shuffle": 1
423  }
424 ````
425 So it has three parameters:
426 
427 1. "cname" &mdash; the sub-algorithm used by the blosc compressor, LZ4 in this case.
428 2. "clevel" &mdash; the compression level, 5 in this case.
429 3. "shuffle" &mdash; is the input shuffled before compression, yes (1) in this case.
430 
431 NCZarr has four constraints that must be met.
432 
433 1. It must store its filter information in its metadata in the above JSON dictionary format.
434 2. It is required to re-use the HDF5 filter implementations.
435 This is to avoid having to rewrite the filter implementations
436 This means that some mechanism is needed to translate between the HDF5 id+parameter model and the Zarr JSON dictionary model.
437 3. It must be possible to modify the set of visible parameters in response to environment information such as the type of the associated variable; this is required to mimic the corresponding HDF5 capability.
438 4. It must be possible to use filters even if HDF5 support is disabled.
439 
440 Note that the term "visible parameters" is used here to refer to the parameters provided by `nc_def_var_filter` or those stored in the dataset's metadata as provided by the JSON codec. The term "working parameters" refers to the parameters given to the compressor itself and derived from the visible parameters.
441 
442 The standard authority for defining Zarr filters is the list supported by the NumCodecs project [7].
443 Comparing the set of standard filters (aka codecs) defined by NumCodecs to the set of standard filters defined by HDF5 [3], it can be seen that the two sets overlap, but each has filters not defined by the other.
444 
445 Note also that it is undesirable that a specific set of filters/codecs be built into the NCZarr implementation.
446 Rather, it is preferable for there be some extensible way to associate the JSON with the code implementing the codec. This mirrors the plugin model used by HDF5.
447 
448 The mechanism provided to address these issues is similar to that taken by HDF5.
449 A shared library must exist that has certain well-defined entry points that allow the NCZarr code to determine information about a Codec.
450 The shared library exports a well-known function name to access Codec information and relate it to a corresponding HDF5 implementation,
451 Note that the shared library may optionally be the same library containing the HDF5
452 filter processor.
453 
454 ### Processing Overview
455 
456 There are several paths by which the NCZarr filter API is invoked.
457 
458 1. The nc\_def\_var\_filter function is invoked on a variable or
459 (1a) the metadata for a variable is read when opening an existing variable that has associated Codecs.
460 2. The visible parameters are converted to a set of working parameters.
461 3. The filter is invoked with the working parameters.
462 4. The dataset is closed using the final set of visible parameters.
463 
464 #### Step 1: Invoking nc\_def\_var\_filter
465 
466 In this case, the filter plugin is located and the set of visible parameters (from nc\_def\_var\_filter) are provided.
467 
468 #### Step 1a: Reading metadata
469 
470 In this case, the codec is read from the metadata and must be converted to a visible set of HDF5 style parameters.
471 It is possible that this set of visible parameters differs from the set that was provided by nc\_def\_var\_filter.
472 If this is important, then the filter implementation is responsible for marking this difference using, for example, different number of parameters or some differing value.
473 
474 #### Step 2: Convert visible parameters to working parameters
475 
476 Given environmental information such as the associated variable's base type, the visible parameters
477 are converted to a potentially larger set of working parameters; additionally provide the opportunity
478 to modify the visible parameters.
479 
480 #### Step 3: Invoking the filter
481 
482 As chunks are read or written, the filter is repeatedly invoked using the working parameters.
483 
484 #### Step 4: Closing the dataset
485 
486 The visible parameters from step 2 are stored in the dataset's metadata.
487 It is desirable to determine if the set of visible parameters changes.
488 If no change is detected, then re-writing the compressor metadata may be avoided.
489 
490 ### Client API
491 
492 Currently, there is no way to specify use of a filter via Codec through
493 the netcdf-c API. Rather, one must know the HDF5 id and parameters of
494 the filter of interest and use the functions *nc\_def\_var\_filter* and *nc\_inq\_var\_filter*.
495 Internally, the NCZarr code will use information about known Codecs to convert the HDF5 filter reference to the corresponding Codec.
496 This restriction also holds for the specification of filters in *ncgen* and *nccopy*.
497 This limitation may be lifted in the future.
498 
499 ### Special Codecs Attribute
500 
501 A new special attribute is defined called *\_Codecs* in parallel to the current *\_Filters* special attribute. Its value is a string containing the JSON representation of the Codecs associated with a given variable.
502 This can be especially useful when a file is unreadable because it uses a filter not available to the netcdf-c library.
503 That is, no implementation was found in the e.g. *HDF5\_PLUGIN\_PATH* directory.
504 In this case *ncdump -hs* will display the raw Codec information so that it may be possible to see what filter is missing.
505 
506 ### Pre-Processing Filter Libraries
507 
508 The process for using filters for NCZarr is defined to operate in several steps.
509 First, as with HDF5, all shared libraries in a specified directory
510 (e.g. *HDF5\_PLUGIN\_PATH*) are scanned.
511 They are interrogated to see what kind of library they implement, if any.
512 This interrogation operates by seeing if certain well-known (function) names are defined in this library.
513 
514 There will be two library types:
515 
516 1. HDF5 &mdash; exports a specific API: `H5Z_plugin_type` and `H5Z_get_plugin_info`.
517 2. Codec &mdash; exports a specific API: `NCZ_get_codec_info`
518 
519 Note that a given library can export either or both of these APIs.
520 This means that we can have three types of libraries:
521 
522 1. HDF5 only
523 2. Codec only
524 3. HDF5 + Codec
525 
526 Suppose that our *HDF5\_PLUGIN\_PATH* location has an HDF5-only library.
527 Then by adding a corresponding, separate, Codec-only library to that same location, it is possible to make an HDF5 library usable by NCZarr.
528 It is possible to do this without having to modify the HDF5-only library.
529 Over time, it is possible to merge an HDF5-only library with a Codec-only library to produce a single, combined library.
530 
531 ### Using Plugin Libraries
532 
533 The netcdf-c library processes all of the shared libraries by interrogating each one for the well-known APIs and recording the result.
534 Any libraries that do not export one or both of the well-known APIs is ignored.
535 
536 Internally, the netcdf-c library pairs up each HDF5 library API with a corresponding Codec API by invoking the relevant well-known functions
537 (See [Appendix E](#filters_appendixe).
538 This results in this table for associated codec and hdf5 libraries.
539 <table>
540 <tr><th>HDF5 API<th>Codec API<th>Action
541 <tr><td>Not defined<td>Not defined<td>Ignore
542 <tr><td>Defined<td>Not defined<td>Ignore
543 <tr><td>Defined<td>Defined<td>NCZarr usable
544 </table>
545 
546 ### Filter Defaults Library
547 
548 As a special case, a shared library may be created to hold
549 defaults for a common set of filters.
550 Basically, there is a specially defined function that returns
551 a vector of codec APIs. These defaults are used only if
552 no other library provides codec information for a filter.
553 Currently, the defaults library provides codec defaults
554 for Shuffle, Fletcher32, Deflate (zlib), and SZIP.
555 
556 ### Using the Codec API
557 
558 Given a set of filters for which the HDF5 API and the Codec API
559 are defined, it is then possible to use the APIs to invoke the
560 filters and to process the meta-data in Codec JSON format.
561 
562 #### Writing an NCZarr Container
563 
564 When writing, the user program will invoke the NetCDF API function *nc\_def\_var\_filter*.
565 This function is currently defined to operate using HDF5-style id and parameters (unsigned ints).
566 The netcdf-c library examines its list of known filters to find one matching the HDF5 id provided by *nc\_def\_var\_filter*.
567 The set of parameters provided is stored internally.
568 Then during writing of data, the corresponding HDF5 filter is invoked to encode the data.
569 
570 When it comes time to write out the meta-data, the stored HDF5-style parameters are passed to a specific Codec function to obtain the corresponding JSON representation. Again see [Appendix E](#filters_appendixe).
571 This resulting JSON is then written in the NCZarr metadata.
572 
573 #### Reading an NCZarr Container
574 
575 When reading, the netcdf-c library will read the metadata for a given variable and will see that some set of filters are applied to this variable.
576 The metadata is encoded as Codec-style JSON.
577 
578 Given a JSON Codec, it is parsed to provide a JSON dictionary containing the string "id" and the set of parameters as various keys.
579 The netcdf-c library examines its list of known filters to find one matching the Codec "id" string.
580 The JSON is passed to a Codec function to obtain the corresponding HDF5-style *unsigned int* parameter vector.
581 These parameters are stored for later use.
582 
583 ### Supporting Filter Chains
584 
585 HDF5 supports *filter chains*, which is a sequence of filters where the output of one filter is provided as input to the next filter in the sequence.
586 When encoding, the filters are executed in the "forward" direction,
587 while when decoding the filters are executed in the "reverse" direction.
588 
589 In the Zarr meta-data, a filter chain is divided into two parts:
590 the "compressor" and the "filters". The former is a single JSON codec
591 as described above. The latter is an ordered JSON array of codecs.
592 So if compressor is something like
593  "compressor": {"id": "c"...}
594 and the filters array is like this:
595  "filters": [ {"id": "f1"...}, {"id": "f2"...}...{"id": "fn"...}]
596 then the filter chain is (f1,f2,...fn,c) with f1 being applied first and c being applied last when encoding. On decode, the filter chain is executed in the order (c,fn...f2,f1).
597 
598 So, an HDF5 filter chain is divided into two parts, where the last filter in the chain is assigned as the "compressor" and the remaining
599 filters are assigned as the "filters".
600 But independent of this, each codec, whether a compressor or a filter,
601 is stored in the JSON dictionary form described earlier.
602 
603 ### Extensions
604 
605 The Codec style, using JSON, has the ability to provide very complex parameters that may be hard to encode as a vector of unsigned integers.
606 It might be desirable to consider exporting a JSON-base API out of the netcdf-c API to support user access to this complexity.
607 This would mean providing some alternate version of `nc_def_var_filter` that takes a string-valued argument instead of a vector of unsigned ints.
608 This extension is unlikely to be implemented until a compelling use-case is encountered.
609 
610 One bad side-effect of this is that we then may have two classes of plugins.
611 One class can be used by both HDF5 and NCZarr, and a second class that is usable only with NCZarr.
612 
613 ### Using The NetCDF-C Plugins
614 
615 As part of its testing, the NetCDF build process creates a number of shared libraries in the *netcdf-c/plugins* (or sometimes *netcdf-c/plugins/.libs*) directory.
616 If you need a filter from that set, you may be able to set *HDF5\_PLUGIN\_PATH*
617 to point to that directory or you may be able to copy the shared libraries out of that directory to your own location.
618 
619 ## Debugging {#filters_debug}
620 
621 Depending on the debugger one uses, debugging plugins can be very difficult.
622 It may be necessary to use the old printf approach for debugging the filter itself.
623 
624 One case worth mentioning is when there is a dataset that is using an unknown filter.
625 For this situation, you need to identify what filter(s) are used in the dataset.
626 This can be accomplished using this command.
627 
628  ncdump -s -h <dataset filename>
629 
630 Since ncdump is not being asked to access the data (the -h flag), it can obtain the filter information without failures.
631 Then it can print out the filter id and the parameters as well as the Codecs (via the -s flag).
632 
633 ### Test Cases {#filters_TestCase}
634 
635 Within the netcdf-c source tree, the directory two directories contain test cases for testing dynamic filter operation.
636 
637 * *netcdf-c/nc\_test4* provides tests for testing HDF5 filters.
638 * *netcdf-c/nczarr\_test* provides tests for testing NCZarr filters.
639 
640 These tests are disabled if *--disable-shared* or if *--disable-filter-tests* is specified
641 or if *--disable-plugins* is specified.
642 
643 ### HDF5 Example {#filters_Example}
644 
645 A slightly simplified version of one of the HDF5 filter test cases is also available as an example within the netcdf-c source tree directory *netcdf-c/examples/C*.
646 The test is called *filter\_example.c* and it is executed as part of the *run\_examples4.sh* shell script.
647 The test case demonstrates dynamic filter writing and reading.
648 
649 The files *example/C/hdf5plugins/Makefile.am* and *example/C/hdf5plugins/CMakeLists.txt* demonstrate how to build the hdf5 plugin for bzip2.
650 
651 ## Notes
652 
653 ### Order of Invocation for Multiple Filters
654 
655 When multiple filters are defined on a variable, the order of application, when writing data to the file, is same as the order in which *nc\_def\_var\_filter*is called.
656 When reading a file the order of application is of necessity the reverse.
657 
658 There are some special cases.
659 
660 1. The fletcher32 filter is always applied first, if enabled.
661 2. If *nc\_def\_var\_filter*or *nc\_def\_var\_deflate*or *nc\_def\_var\_szip*is called multiple times with the same filter id, but possibly with different sets of parameters, then the position of that filter in the sequence of applictions does not change.
662  However the last set of parameters specified is used when actually writing the dataset.
663 3. Deflate and shuffle &mdash; these two are inextricably linked in the current API, but have quite different semantics.
664  If you call *nc\_def\_var\_deflate*multiple times, then the previous rule applies with respect to deflate.
665  However, the shuffle filter, if enabled, is *always* applied before applying any other filters, except fletcher32.
666 4. Once a filter is defined for a variable, it cannot be removed nor can its position in the filter order be changed.
667 
668 ### Memory Allocation Issues
669 
670 Starting with HDF5 version 1.10.*, the plugin code MUST be careful when using the standard *malloc()*, *realloc()*, and *free()* function.
671 
672 In the event that the code is allocating, reallocating, for
673 free'ing memory that either came from or will be exported to the
674 calling HDF5 library, then one MUST use the corresponding HDF5
675 functions *H5allocate\_memory()*, *H5resize\_memory()*,
676 *H5free\_memory()* [5] to avoid memory failures.
677 
678 Additionally, if your filter code leaks memory, then the HDF5 library generates a failure something like this.
679 
680  H5MM.c:232: H5MM_final_sanity_check: Assertion `0 == H5MM_curr_alloc_bytes_s' failed.
681 
682 One can look at the the code in plugins/H5Zbzip2.c and H5Zmisc.c as illustrations.
683 
684 ### SZIP Issues
685 
686 The current szip plugin code in the HDF5 library has some behaviors that can catch the unwary.
687 These are handled internally to (mostly) hide them so that they should not affect users.
688 Specifically, this filter may do two things.
689 
690 1. Add extra parameters to the filter parameters: going from the two parameters provided by the user to four parameters for internal use.
691  It turns out that the two parameters provided when calling nc\_def\_var\_filter correspond to the first two parameters of the four parameters returned by nc\_inq\_var\_filter.
692 2. Change the values of some parameters: the value of the *options\_mask* argument is known to add additional flag bits, and the *pixels\_per\_block* parameter may be modified.
693 
694 The reason for these changes is has to do with the fact that the szip API provided by the underlying H5Pset\_szip function is actually a subset of the capabilities of the real szip implementation.
695 Presumably this is for historical reasons.
696 
697 In any case, if the caller uses the *nc\_inq\_var\_szip* or the *nc\_inq\_var\_filter* functions, then the parameter values returned may differ from those originally specified.
698 
699 It should also be noted that the HDF5 szip filter wrapper that
700 is invoked depends on the configuration of the netcdf-c library.
701 If the HDF5 installation supports szip, then the NCZarr szip
702 will use the HDF5 wrapper. If HDF5 does not support szip, or HDF5
703 is not enabled, then the plugins directory will contain a local
704 HDF5 szip wrapper to be used by NCZarr. This can be confusing,
705 but is generally transparent to the use since the plugins
706 HDF5 szip wrapper was taken from the HDF5 code base.
707 
708 ### Supported Systems
709 
710 The current matrix of OS X build systems known to work is as follows.
711 <table>
712 <tr><th>Build System<th>Supported OS
713 <tr><td>Automake<td>Linux, Cygwin, OSX
714 <tr><td>Cmake<td>Linux, Cygwin, OSX, Visual Studio
715 </table>
716 
717 ### Generic Plugin Build
718 If you do not want to use Automake or Cmake, the following has been known to work.
719 
720  gcc -g -O0 -shared -o libbzip2.so <plugin source files> -L${HDF5LIBDIR} -lhdf5\_hl -lhdf5 -L${ZLIBDIR} -lz
721 
722 ## References {#filters_References}
723 
724 1. https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf
725 2. https://support.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf
726 3. https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins
727 4. https://support.hdfgroup.org/services/contributions.html#filters
728 5. https://support.hdfgroup.org/HDF5/doc/RM/RM\_H5.html
729 6. https://confluence.hdfgroup.org/display/HDF5/Filters
730 7. https://numcodecs.readthedocs.io/en/stable/
731 8. https://github.com/ccr/ccr
732 9. https://escholarship.org/uc/item/7xd1739k
733 
734 ## Appendix A. HDF5 Parameter Encode/Decode {#filters_appendixa}
735 
736 The filter id for an HDF5 format filter is an unsigned integer.
737 Further, the parameters passed to an HDF5 format filter are encoded internally as a vector of 32-bit unsigned integers.
738 It may be that the parameters required by a filter can naturally be encoded as unsigned integers.
739 The bzip2 compression filter, for example, expects a single integer value from zero thru nine.
740 This encodes naturally as a single unsigned integer.
741 
742 Note that signed integers and single-precision (32-bit) float values also can easily be represented as 32 bit unsigned integers by proper casting to an unsigned integer so that the bit pattern is preserved.
743 Simple signed integer values of type short or char can also be mapped to an unsigned integer by truncating to 16 or 8 bits respectively and then sign extending. Similarly, unsigned 8 and 16 bit
744 values can be used with zero extensions.
745 
746 Machine byte order (aka endian-ness) is an issue for passing some kinds of parameters.
747 You might define the parameters when compressing on a little endian machine, but later do the decompression on a big endian machine.
748 
749 When using HDF5 format filters, byte order is not an issue for 32-bit values because HDF5 takes care of converting them between the local machine byte order and network byte order.
750 
751 Parameters whose size is larger than 32-bits present a byte order problem.
752 This specifically includes double precision floats and (signed or unsigned) 64-bit integers.
753 For these cases, the machine byte order issue must be handled, in part, by the compression code.
754 This is because HDF5 will treat, for example, an unsigned long long as two 32-bit unsigned integers and will convert each to network order separately.
755 This means that on a machine whose byte order is different than the machine in which the parameters were initially created, the two integers will be separately
756 endian converted.
757 But this will be incorrect for 64-bit values.
758 
759 So, we have this situation (for HDF5 only):
760 
761 1. the 8 bytes start as native machine order for the machine doing the call to *nc\_def\_var\_filter*.
762 2. The caller divides the 8 bytes into 2 four byte pieces and passes them to *nc\_def\_var\_filter*.
763 3. HDF5 takes each four byte piece and ensures that each piece is in network (big) endian order.
764 4. When the filter is called, the two pieces are returned in the same order but with the bytes in each piece consistent with the native machine order for the machine executing the filter.
765 
766 ### Encoding Algorithms for HDF5
767 
768 In order to properly extract the correct 8-byte value, we need to ensure that the values stored in the HDF5 file have a known format independent of the native format of the creating machine.
769 
770 The idea is to do sufficient manipulation so that HDF5 will store the 8-byte value as a little endian value divided into two 4-byte integers.
771 Note that little-endian is used as the standard because it is the most common machine format.
772 When read, the filter code needs to be aware of this convention and do the appropriate conversions.
773 
774 This leads to the following set of rules.
775 
776 #### Encoding
777 
778 1. Encode on little endian (LE) machine: no special action is required.
779  The 8-byte value is passed to HDF5 as two 4-byte integers.
780  HDF5 byte swaps each integer and stores it in the file.
781 2. Encode on a big endian (BE) machine: several steps are required:
782 
783  1. Do an 8-byte byte swap to convert the original value to little-endian format.
784  2. Since the encoding machine is BE, HDF5 will just store the value.
785  So it is necessary to simulate little endian encoding by byte-swapping each 4-byte integer separately.
786  3. This doubly swapped pair of integers is then passed to HDF5 and is stored unchanged.
787 
788 #### Decoding
789 
790 1. Decode on LE machine: no special action is required.
791  HDF5 will get the two 4-bytes values from the file and byte-swap each separately.
792  The concatenation of those two integers will be the expected LE value.
793 2. Decode on a big endian (BE) machine: the inverse of the encode case must be implemented.
794 
795  1. HDF5 sends the two 4-byte values to the filter.
796  2. The filter must then byte-swap each 4-byte value independently.
797  3. The filter then must concatenate the two 4-byte values into a single 8-byte value.
798  Because of the encoding rules, this 8-byte value will be in LE format.
799  4. The filter must finally do an 8-byte byte-swap on that 8-byte value to convert it to desired BE format.
800 
801 To support these rules, some utility programs exist and are discussed in [Appendix B](#filters_appendixb).
802 
803 ## Appendix B. Support Utilities {#filters_appendixb}
804 
805 Several functions are exported from the netcdf-c library for use by client programs and by filter implementations.
806 They are defined in the header file *netcdf\_aux.h*.
807 The h5 tag indicates that they assume that the result of the parse is a set of unsigned integers &mdash; the format used by HDF5.
808 
809 1. *int ncaux\_h5filterspec\_parse(const char* txt, unsigned int* idp. size\_t* nparamsp, unsigned int** paramsp);*
810  * txt contains the text of a sequence of comma separated constants
811  * idp will contain the first constant &mdash; the filter id
812  * nparamsp will contain the number of params
813  * paramsp will contain a vector of params &mdash; the caller must free
814 This function can parse single filter spec strings as defined in the section on [Filter Specification Syntax](#filters_syntax).
815 2. *int ncaux\_h5filterspec\_parselist(const char* txt, int* formatp, size\_t* nspecsp, struct NC\_H5\_Filterspec*** vectorp);*
816  * txt contains the text of a sequence '|' separated filter specs.
817  * formatp currently always returns 0.
818  * nspecsp will return the number of filter specifications.
819  * vectorp will return a pointer to a vector of pointers to filter specification instances &mdash; the caller must free.
820 This function parses a sequence of filter specifications each separated by a '|' character.
821 The text between '|' separators must be parsable by *ncaux\_h5filterspec\_parse*.
822 3. *void ncaux\_h5filterspec\_free(struct NC\_H5\_Filterspec* f);*
823  * f is a pointer to an instance of *struct NC\_H5\_Filterspec*
824  Typically this was returned as an element of the vector returned
825  by *\_ncaux\_h5filterspec\_parselist*.
826 This reclaims the parameters of the filter spec object as well as the object itself.
827 4. *int ncaux\_h5filterspec\_fix8(unsigned char* mem8, int decode);*
828  * mem8 is a pointer to the 8-byte value either to fix.
829  * decode is 1 if the function should apply the 8-byte decoding algorithm
830  else apply the encoding algorithm.
831 This function implements the 8-byte conversion algorithms for HDF5.
832 Before calling *nc\_def\_var\_filter* (unless *NC\_parsefilterspec* was used), the client must call this function with the decode argument set to 0.
833 Inside the filter code, this function should be called with the decode argument set to 1.
834 
835 Examples of the use of these functions can be seen in the test program *nc\_test4/tst\_filterparser.c*.
836 
837 Some of the above functions use a C struct defined in *netcdf\_filter.h\_.
838 The definition of that struct is as follows.
839 ````
840 typedef struct NC_H5_Filterspec {
841  unsigned int filterid; /* ID for arbitrary filter. */
842  size_t nparams; /* nparams for arbitrary filter. */
843  unsigned int* params; /* Params for arbitrary filter. */
844 } NC_H5_Filterspec;
845 ````
846 This struct in effect encapsulates all of the information about and HDF5 formatted filter &mdash; the id, the number of parameters, and the parameters themselves.
847 
848 ## Appendix C. Build Flags for Detecting the Filter Mechanism {#filters_appendixc}
849 
850 The include file *netcdf\_meta.h* contains the following definition.
851 ````
852  #define NC_HAS_MULTIFILTERS 1
853 ````
854 This, in conjunction with the error code *NC\_ENOFILTER* in *netcdf.h* can be used to see what filter mechanism is in place as described in the section on [incompatibities](#filters_compatibility).
855 
856 1. !defined(NC\_ENOFILTER) && !defined(NC\_HAS\_MULTIFILTERS) &mdash; indicates that the old pre-4.7.4 mechanism is in place.
857  It does not support multiple filters.
858 2. defined(NC\_ENOFILTER) && !defined(NC\_HAS\_MULTIFILTERS) &mdash; indicates that the 4.7.4 mechanism is in place.
859  It does support multiple filters, but the error return codes for *nc\_inq\_var\_filter* are different and the filter spec parser functions are in a different location with different names.
860 3. defined(NC\_ENOFILTER) && defined(NC\_HAS\_MULTIFILTERS) &mdash; indicates that the multiple filters are supported, and that *nc\_inq\_var\_filter* returns a filterid of zero to indicate that a variable has no filters.
861  Also, the filter spec parsers have the names and signatures described in this document and are define in *netcdf\_aux.h*.
862 
863 ## Appendix D. BNF for Specifying Filters in Utilities {#filters_appendixd}
864 
865 ````
866 speclist: spec
867  | speclist '|' spec
868  ;
869 spec: filterid
870  | filterid ',' parameterlist
871  ;
872 filterid: unsigned32
873  ;
874 parameterlist: parameter
875  | parameterlist ',' parameter
876  ;
877 parameter: unsigned32
878 
879 where
880 unsigned32: <32 bit unsigned integer>
881 ````
882 
883 ## Appendix E. Codec API {#filters_appendixe}
884 
885 The Codec API mirrors the HDF5 API closely. It has one well-known function that can be invoked to obtain information about the Codec as well as pointers to special functions to perform conversions.
886 
887 ### The Codec Plugin API
888 
889 #### NCZ\_get\_codec\_info
890 
891 This function returns a pointer to a C struct that provides detailed information about the codec plugin.
892 
893 ##### Signature
894 ````
895  void* NCZ_get_codec_info(void);
896 ````
897 The value returned is actually of type *struct NCZ\_codec\_t*,
898 but is of type *void\** to allow for extensions.
899 
900 #### NCZ\_codec\_t
901 ````
902 typedef struct NCZ_codec_t {
903  int version; /* Version number of the struct */
904  int sort; /* Format of remainder of the struct;
905  Currently always NCZ_CODEC_HDF5 */
906  const char* codecid; /* The name/id of the codec */
907  unsigned int hdf5id; /* corresponding hdf5 id */
908  void (*NCZ_codec_initialize)(void);
909  void (*NCZ_codec_finalize)(void);
910  int (*NCZ_codec_to_hdf5)(const char* codec, int* nparamsp, unsigned** paramsp);
911  int (*NCZ_hdf5_to_codec)(size_t nparams, const unsigned* params, char** codecp);
912  int (*NCZ_modify_parameters)(int ncid, int varid, size_t* vnparamsp, unsigned** vparamsp, size_t* nparamsp, unsigned** paramsp);
913 } NCZ_codec_t;
914 ````
915 
916 The semantics of the non-function fields is as follows:
917 
918 1. *version* &mdash; Version number of the struct.
919 2. *sort* &mdash; Format of remainder of the struct; currently always NCZ\_CODEC\_HDF5.
920 3. *codecid* &mdash; The name/id of the codec.
921 4. *hdf5id* &mdash; The corresponding hdf5 id.
922 
923 #### NCZ\_codec\_to\_hdf5
924 
925 Given a JSON Codec representation, it will return a corresponding vector of unsigned integers representing the
926 visible parameters.
927 
928 ##### Signature
929 ````
930  int NCZ_codec_to_hdf(const char* codec, int* nparamsp, unsigned** paramsp);
931 ````
932 ##### Arguments
933 1. codec &mdash; (in) ptr to JSON string representing the codec.
934 2. nparamsp &mdash; (out) store the length of the converted HDF5 unsigned vector
935 3. paramsp &mdash; (out) store a pointer to the converted HDF5 unsigned vector; caller must free the returned vector. Note the double indirection.
936 
937 Return Value: a netcdf-c error code.
938 
939 #### NCZ\_hdf5\_to\_codec
940 
941 Given an HDF5 visible parameters vector of unsigned integers and its length,
942 return a corresponding JSON codec representation of those visible parameters.
943 
944 ##### Signature
945 ````
946  int NCZ_hdf5_to_codec)(int ncid, int varid, size_t nparams, const unsigned* params, char** codecp);
947 ````
948 ##### Arguments
949 
950 1. ncid &mdash; the variables' containing group
951 2. varid &mdash; the containing variable
952 3. nparams &mdash; (in) the length of the HDF5 visible parameters vector
953 4. params &mdash; (in) pointer to the HDF5 visible parameters vector.
954 5. codecp &mdash; (out) store the string representation of the codec; caller must free.
955 
956 Return Value: a netcdf-c error code.
957 
958 #### NCZ\_modify\_parameters
959 
960 Extract environment information from the (ncid,varid) and use it to convert a set of visible parameters
961 to a set of working parameters; also provide option to modify visible parameters.
962 
963 ##### Signature
964 ````
965  int NCZ_modify_parameters(int ncid, int varid, size_t* vnparamsp, unsigned** vparamsp, size_t* wnparamsp, unsigned** wparamsp);
966 ````
967 ##### Arguments
968 
969 1. ncid &mdash; (in) group id containing the variable.
970 2. varid &mdash; (in) the id of the variable to which this filter is being attached.
971 3. vnparamsp &mdash; (in/out) the count of visible parameters
972 4. vparamsp &mdash; (in/out) the set of visible parameters
973 5. wnparamsp &mdash; (out) the count of working parameters
974 4. wparamsp &mdash; (out) the set of working parameters
975 
976 Return Value: a netcdf-c error code.
977 
978 #### NCZ\_codec\_initialize
979 
980 Some compressors may require library initialization.
981 This function is called as soon as a shared library is loaded and matched with an HDF5 filter.
982 
983 ##### Signature
984 ````
985  int NCZ_codec_initialize)(void);
986 ````
987 Return Value: a netcdf-c error code.
988 
989 #### NCZ\_codec\_finalize
990 
991 Some compressors (like blosc) require invoking a finalize function in order to avoid memory loss.
992 This function is called during a call to *nc\_finalize* to do any finalization.
993 If the client code does not invoke *nc\_finalize* then memory checkers may complain about lost memory.
994 
995 ##### Signature
996 ````
997  int NCZ_codec_finalize)(void);
998 ````
999 Return Value: a netcdf-c error code.
1000 
1001 ### Multi-Codec API
1002 
1003 As an aid to clients, it is convenient if a single shared library can provide multiple *NCZ\_code\_t* instances at one time.
1004 This API is not intended to be used by plugin developers.
1005 A shared library must only export this function.
1006 
1007 #### NCZ\_codec\_info\_defaults
1008 
1009 Return a NULL terminated vector of pointers to instances of *NCZ\_codec\_t*.
1010 
1011 ##### Signature
1012 ````
1013  void* NCZ_codec_info_defaults(void);
1014 ````
1015 The value returned is actually of type *NCZ\_codec\_t***,
1016 but is of type *void** to allow for extensions.
1017 The list of returned items are used to try to provide defaults
1018 for any HDF5 filters that have no corresponding Codec.
1019 This is for internal use only.
1020 
1021 ## Appendix F. Standard Filters {#filters_appendixf}
1022 
1023 Support for a select set of standard filters is built into the NetCDF API.
1024 Generally, they are accessed using the following generic API, where XXXX is
1025 the filter name. As a rule, the names are those used in the HDF5 filter ID naming authority [4] or the NumCodecs naming authority [7].
1026 ````
1027 int nc_def_var_XXXX(int ncid, int varid, unsigned filterid, size_t nparams, unsigned* params);
1028 int nc_inq_var_XXXX(int ncid, int varid, int* hasfilter, size_t* nparamsp, unsigned* params);
1029 ````
1030 The first function inserts the specified filter into the filter chain for a given variable.
1031 The second function queries the given variable to see if the specified function
1032 is in the filter chain for that variable. The *hasfilter* argument is set
1033 to one if the filter is in the chain and zero otherwise.
1034 As is usual with the netcdf API, one is expected to call this function twice.
1035 The first time to set *nparamsp* and the second to get the parameters in the client-allocated memory argument *params*.
1036 Any of these arguments can be NULL, in which case no value is returned.
1037 
1038 Note that NetCDF inherits four filters from HDF5, namely shuffle, fletcher32, deflate (zlib), and szip. The API's for these do not conform to the above API.
1039 So aside from those four, the current set of standard filters is as follows.
1040 <table>
1041 <tr><th>Filter Name<th>Filter ID<th>Reference
1042 <tr><td>zstandard<td>32015<td>https://facebook.github.io/zstd/
1043 <tr><td>bzip2<td>307<td>https://sourceware.org/bzip2/
1044 </table>
1045 
1046 It is important to note that in order to use each standard filter, several additonal libraries must be installed.
1047 Consider the zstandard compressor, which is one of the supported standard filters.
1048 When installing the netcdf library, the following other libraries must be installed.
1049 
1050 1. *libzstd.so* | *zstd.dll* | *libzstd.dylib* -- The actual zstandard compressor library; typically installed by using your platform specific package manager.
1051 2. The HDF5 wrapper for *libzstd.so* -- There are several options for obtaining this (see [Appendix G](#filters_appendixg).)
1052 3. (Optional) The Zarr wrapper for *libzstd.so* -- you need this if you intend to read/write Zarr datasets that were compressed using zstandard; again see [Appendix G](#filters_appendixg).
1053 
1054 ## Appendix G. Finding Filters {#filters_appendixg}
1055 
1056 A major problem for filter users is finding an implementation of an HDF5 filter wrapper and (optionally)
1057 its corresponding NCZarr wrapper. There are several ways to do this.
1058 
1059 * **--with-plugin-dir** &mdash; An option to *./configure* that will install the necessary wrappers.
1060  See [Appendix H](#filters_appendixh).
1061 
1062 * **HDF5 Assigned Filter Identifiers Repository [3]** &mdash;
1063 HDF5 maintains a page of standard filter identifiers along with
1064 additional contact information. This often includes a pointer
1065 to source code. This will provide only HDF5 wrappers and not NCZarr wrappers.
1066 
1067 * **Community Codec Repository** &mdash;
1068 The Community Codec Repository (CCR) project [8] provides
1069 filters, including HDF5 wrappers, for a number of filters.
1070 It does not as yet provide Zarr wrappers.
1071 You can install this library to get access to these supported filters.
1072 It does not currently include the required NCZarr Codec API,
1073 so they are only usable with netcdf-4. This will change in the future.
1074 
1075 ## Appendix H. Auto-Install of Filter Wrappers {#filters_appendixh}
1076 
1077 As part of the overall build process, a number of filter wrappers are built as shared libraries in the "plugins" directory.
1078 These wrappers can be installed as part of the overall netcdf-c installation process.
1079 WARNING: the installer still needs to make sure that the actual filter/compression libraries are installed: e.g. libzstd and/or libblosc.
1080 
1081 The target location into which libraries in the "plugins" directory are installed is specified
1082 using a special *./configure* option
1083 ````
1084 --with-plugin-dir=<directorypath>
1085 or
1086 --with-plugin-dir
1087 ````
1088 or its corresponding *cmake* option.
1089 ````
1090 -DPLUGIN_INSTALL_DIR=<directorypath>
1091 or
1092 -DPLUGIN_INSTALL_DIR=YES
1093 ````
1094 This option defaults to the value "yes", which means that filters are
1095 installed by default. This can be disabled by one of the following options.
1096 ````
1097 --without-plugin-dir (automake)
1098 or
1099 --with-plugin-dir=no (automake)
1100 or
1101 -DPLUGIN_INSTALL_DIR=NO (CMake)
1102 ````
1103 
1104 If the option is specified with no argument (automake) or with the value "YES" (CMake),
1105 then it defaults (in order) to the following directories:
1106 1. If the HDF5_PLUGIN_PATH environment variable is defined, then last directory in the list of directories in the path is used.
1107 2. (a) "/usr/local/hdf5/lib/plugin” for linux/unix operating systems (including Cygwin)<br>
1108  (b) “%ALLUSERSPROFILE%\\hdf5\\lib\\plugin” for Windows and MinGW
1109 
1110 If NCZarr is enabled, then in addition to wrappers for the standard filters,
1111 additional libraries will be installed to support NCZarr access to filters.
1112 Currently, this list includes the following:
1113 * shuffle &mdash; shuffle filter
1114 * fletcher32 &mdash; fletcher32 checksum
1115 * deflate &mdash; deflate compression
1116 * (optional) szip &mdash; szip compression, if libsz is available
1117 * bzip2 &mdash; an HDF5 filter for bzip2 compression
1118 * lib__nczh5filters.so &mdash; provide NCZarr support for shuffle, fletcher32, deflate, and (optionally) szip.
1119 * lib__nczstdfilters.so &mdash; provide NCZarr support for bzip2, (optionally)zstandard, and (optionally) blosc.
1120 
1121 The shuffle, fletcher32, and deflate filters in this case will
1122 be ignored by HDF5 and only used by the NCZarr code. But in
1123 order to use them, it needs additional Codec capabilities
1124 provided by the *lib__nczh5filters.so* shared library. Note also that
1125 if you disable HDF5 support, but leave NCZarr support enabled,
1126 then all of the above filters should continue to work.
1127 
1128 ### HDF5_PLUGIN_PATH
1129 
1130 At the moment, NetCDF uses the existing HDF5 environment variable
1131 *HDF5\_PLUGIN\_PATH* to locate the directories in which filter wrapper
1132 shared libraries are located. This is used both for the HDF5 filter
1133 wrappers but also the NCZarr codec wrappers.
1134 
1135 *HDF5\_PLUGIN\_PATH* is a typical Windows or Unix style
1136 path-list. That is it is a sequence of absolute directory paths
1137 separated by a specific separator character. For Windows, the
1138 separator character is a semicolon (';') and for Unix, it is a a
1139 colon (':').
1140 
1141 So, if HDF5_PLUGIN_PATH is defined at build time, and
1142 *--with-plugin-dir* is specified with no argument then the last
1143 directory in the path will be the one into which filter wrappers are
1144 installed. Otherwise the default directories are used.
1145 
1146 The important thing to note is that at run-time, there are several cases to consider:
1147 
1148 1. HDF5_PLUGIN_PATH is defined and is the same value as it was at build time -- no action needed
1149 2. HDF5_PLUGIN_PATH is defined and is has a different value from build time -- the user is responsible for ensuring that the run-time path includes the same directory used at build time, otherwise this case will fail.
1150 3. HDF5_PLUGIN_DIR is not defined at either run-time or build-time -- no action needed
1151 4. HDF5_PLUGIN_DIR is not defined at run-time but was defined at build-time -- this will probably fail
1152 
1153 ## Point of Contact {#filters_poc}
1154 
1155 *Author*: Dennis Heimbigner<br>
1156 *Email*: dmh at ucar dot edu<br>
1157 *Initial Version*: 1/10/2018<br>
1158 *Last Revised*: 5/18/2022