1 Appendix D. NetCDF-4 Filter Support {#filters}
2 ==================================
6 > See @ref nc_filters_quickstart for tips to get started quickly with NetCDF-4 Filter Support.
8 ## Filters Overview {#filters_overview}
10 NetCDF-C filters have some features of which the user
13 * ***Auto Install of filters***<br>
14 An option is now provided to automatically install
15 HDF5 filters into a default location, or optionally
16 into a user-specified location. This is described in
17 [Appendix H](#filters_appendixh)
18 (with supporting information in [Appendix G](#filters_appendixg)).
20 * ***NCZarr Filter Support***<br>
21 [NCZarr filters](#filters_nczarr) are now supported.
22 This essentially means that it is possible to specify
23 Zarr Codecs (Zarr equivalent of filters) in Zarr files
24 and have them processed using HDF5-style wrapper shared libraries.
25 Zarr filters can be used even if HDF5 support is disabled
26 in the netCDF-C library.
28 ## Introduction to Filters {#filters_introduction}
30 The netCDF library supports a general filter mechanism to apply
31 various kinds of filters to datasets before reading or writing.
32 The most common kind of filter is a compression-decompression
33 filter, and that is the focus of this document.
34 But non-compression filters – fletcher32, for example – also exist.
36 The netCDF enhanced (aka netCDF-4) library inherits this
37 capability since it depends on the HDF5 library. The HDF5
38 library (1.8.11 and later) supports filters, and netCDF is based
39 closely on that underlying HDF5 mechanism.
41 Filters assume that a variable has chunking defined and each
42 chunk is filtered before writing and "unfiltered" after reading
43 and before passing the data to the user. In the event that
44 multiple filters are defined on a variable, they are applied in
45 first-defined order on writing and on the reverse order when
48 This document describes the support for HDF5 filters and also
49 the newly added support for NCZarr filters.
51 ## A Warning on Backward Compatibility {#filters_compatibility}
53 The API defined in this document should accurately reflect the
54 current state of filters in the netCDF-c library. Be aware that
55 there was a short period in which the filter code was undergoing
56 some revision and extension. Those extensions have largely been
57 reverted. Unfortunately, some users may experience some
58 compilation problems for previously working code because of
59 these reversions. In that case, please revise your code to
60 adhere to this document. Apologies are extended for any
63 A user may encounter an incompatibility if any of the following appears in user code.
65 * The function *\_nc\_inq\_var\_filter* was returning the error value NC\_ENOFILTER if a variable had no associated filters.
66 It has been reverted to the previous case where it returns NC\_NOERR and the returned filter id was set to zero if the variable had no filters.
67 * The function *nc\_inq\_var\_filterids* was renamed to *nc\_inq\_var\_filter\_ids*.
68 * Some auxilliary functions for parsing textual filter specifications have been moved to the file *netcdf\_aux.h*. See [Appendix A](#filters_appendixa).
69 * All of the "filterx" functions have been removed. This is unlikely to cause problems because they had limited visibility.
71 For additional information, see [Appendix B](#filters_appendixb).
73 ## Enabling A HDF5 Compression Filter {#filters_enable}
75 HDF5 supports dynamic loading of compression filters using the
76 following process for reading of compressed data.
78 1. Assume that we have a dataset with one or more variables that were compressed using some algorithm.
79 How the dataset was compressed will be discussed subsequently.
80 2. Shared libraries or DLLs exist that implement the compress/decompress algorithm.
81 These libraries have a specific API so that the HDF5 library can locate, load, and utilize the compressor.
82 3. These libraries are expected to installed in a specific directory.
84 In order to compress a variable with an HDF5 compliant filter,
85 the netcdf-c library must be given three pieces of information:
87 1. some unique identifier for the filter to be used,
88 2. a vector of parameters for controlling the action of the compression filter, and
89 3. access to a shared library implementation of the filter.
91 The meaning of the parameters is, of course, completely filter
92 dependent and the filter description [3] needs to be consulted.
93 For bzip2, for example, a single parameter is provided
94 representing the compression level. It is legal to provide a
95 zero-length set of parameters. Defaults are not provided, so
96 this assumes that the filter can operate with zero parameters.
98 Filter ids are assigned by the HDF group. See [4] for a current
99 list of assigned filter ids. Note that ids above 32767 can be
100 used for testing without registration.
102 The first two pieces of information can be provided in one of
103 three ways: (1) using *ncgen*, (2) via an API call, or (3) via
104 command line parameters to *nccopy*. In any case, remember that
105 filtering also requires setting chunking, so the variable must
106 also be marked with chunking information. If compression is set
107 for a non-chunked variable, the variable will forcibly be
108 converted to chunked using a default chunking algorithm.
110 ## Using The API {#filters_API}
111 The necessary API methods are included in *netcdf\_filter.h* by default.
112 These functions implicitly use the HDF5 mechanisms and may produce an error if applied to a file format that is not compatible with the HDF5 mechanism.
114 ### nc\_def\_var\_filter
115 Add a filter to the set of filters to be used when writing a variable. This must be invoked after the variable has been created and before *nc\_enddef* is invoked.
117 int nc_def_var_filter(int ncid, int varid, unsigned int id,
118 size_t nparams, const unsigned int* params);
122 * ncid — File and group ID.
123 * varid — Variable ID.
124 * id — Filter ID.
125 * nparams — Number of filter parameters.
126 * params — Filter parameters (a vector of unsigned integers)
130 * NC\_NOERR — No error.
131 * NC\_ENOTNC4 — Not a netCDF-4 file.
132 * NC\_EBADID — Bad ncid or bad filter id
133 * NC\_ENOTVAR — Invalid variable ID.
134 * NC\_EINDEFINE — called when not in define mode
135 * NC\_ELATEDEF — called after variable was created
136 * NC\_EINVAL — Scalar variable, or parallel enabled and parallel filters not supported or nparams or params invalid.
138 ### nc\_inq\_var\_filter\_ids
139 Query a variable to obtain a list of the ids of all filters associated with that variable.
141 int nc_inq_var_filter_ids(int ncid, int varid, size_t* nfiltersp, unsigned int* filterids);
145 * ncid — File and group ID.
146 * varid — Variable ID.
147 * nfiltersp — Stores number of filters found; may be zero.
148 * filterids — Stores set of filter ids.
152 * NC\_NOERR — No error.
153 * NC\_ENOTNC4 — Not a netCDF-4 file.
154 * NC\_EBADID — Bad ncid
155 * NC\_ENOTVAR — Invalid variable ID.
157 The number of filters associated with the variable is stored in *nfiltersp* (it may be zero).
158 The set of filter ids will be returned in *filterids*.
159 As is usual with the netcdf API, one is expected to call this function twice.
160 The first time to set *nfiltersp* and the second to get the filter ids in client-allocated memory.
161 Any of these arguments can be NULL, in which case no value is returned.
163 ### nc\_inq\_var\_filter\_info
164 Query a variable to obtain information about a specific filter associated with the variable.
166 int nc_inq_var_filter_info(int ncid, int varid, unsigned int id, size_t* nparamsp, unsigned int* params);
170 * ncid — File and group ID.
171 * varid — Variable ID.
172 * id — The filter id of interest.
173 * nparamsp — Stores number of parameters.
174 * params — Stores set of filter parameters.
178 * NC\_NOERR — No error.
179 * NC\_ENOTNC4 — Not a netCDF-4 file.
180 * NC\_EBADID — Bad ncid
181 * NC\_ENOTVAR — Invalid variable ID.
182 * NC\_ENOFILTER — Filter not defined for the variable.
184 The *id* indicates the filter of interest.
185 The actual parameters are stored in *params*.
186 The number of parameters is returned in *nparamsp*.
187 As is usual with the netcdf API, one is expected to call this function twice.
188 The first time to set *nparamsp* and the second to get the parameters in client-allocated memory.
189 Any of these arguments can be NULL, in which case no value is returned.
190 If the specified id is not attached to the variable, then NC\_ENOFILTER is returned.
192 ### nc\_inq\_var\_filter
193 Query a variable to obtain information about the first filter associated with the variable.
194 When netcdf-c was modified to support multiple filters per variable, the utility of this function became redundant since it returns info only about the first defined filter for the variable.
195 Internally, it is implemented using the functions *nc\_inq\_var\_filter\_ids* and *nc\_inq\_filter\_info*.
198 int nc_inq_var_filter(int ncid, int varid, unsigned int* idp, size_t* nparamsp, unsigned int* params);
203 * ncid — File and group ID.
204 * varid — Variable ID.
205 * idp — Stores the id of the first found filter, set to zero if variable has no filters.
206 * nparamsp — Stores number of parameters.
207 * params — Stores set of filter parameters.
211 * NC\_NOERR — No error.
212 * NC\_ENOTNC4 — Not a netCDF-4 file.
213 * NC\_EBADID — Bad ncid
214 * NC\_ENOTVAR — Invalid variable ID.
216 The filter id will be returned in the *idp* argument.
217 If there are no filters, then zero is stored in this argument.
218 Otherwise, the number of parameters is stored in *nparamsp* and the actual parameters in *params*.
219 As is usual with the netcdf API, one is expected to call this function twice.
220 The first time to get *nparamsp* and the second to get the parameters in client-allocated memory.
221 Any of these arguments can be NULL, in which case no value is returned.
223 ## Using ncgen {#filters_NCGEN}
225 In a CDL file, compression of a variable can be specified by annotating it with the following attribute:
227 * *\_Filter* — a string containing a comma separated list of constants specifying (1) the filter id to apply, and (2) a vector of constants representing the parameters for controlling the operation of the specified filter.
228 See the section on the <a href="#filters_syntax">parameter encoding syntax</a> for the details on the allowable kinds of constants.
230 This is a "special" attribute, which means that it will normally be invisible when using *ncdump* unless the -s flag is specified.
232 For backward compatibility it is probably better to use the *\_Deflate* attribute instead of *\_Filter*. But using *\_Filter* to specify deflation will work.
234 Multiple filters can be specified for a given variable by using the "|" separator.
235 Alternatively, this attribute may be repeated to specify multiple filters.
237 Note that the lexical order of declaration is important when more than one filter is specified for a variable because it determines the order in which the filters are applied.
239 ### Example CDL File (Data elided)
244 dim0 = 4 ; dim1 = 4 ; dim2 = 4 ; dim3 = 4 ;
246 float var(dim0, dim1, dim2, dim3) ;
247 var:_Filter = "307,9|4,32,32" ; // bzip2 then szip
248 var:_Storage = "chunked" ;
249 var:_ChunkSizes = 4, 4, 4, 4 ;
255 Note that the assigned filter id for bzip2 is 307 and for szip it is 4.
257 ## Using nccopy {#filters_NCCOPY}
259 When copying a netcdf file using *nccopy* it is possible to specify filter information for any output variable by using the "-F" option on the command line; for example:
261 nccopy -F "var,307,9" unfiltered.nc filtered.nc
263 Assume that *unfiltered.nc* has a chunked but not bzip2 compressed variable named "var".
264 This command will copy that variable to the *filtered.nc* output file but using filter with id 307 (i.e. bzip2) and with parameter(s) 9 indicating the compression level.
265 See the section on the <a href="#filters_syntax">parameter encoding syntax</a> for the details on the allowable kinds of constants.
267 The "-F" option can be used repeatedly, as long as a different variable is specified for each occurrence.
269 It can be convenient to specify that the same compression is to be applied to more than one variable. To support this, two additional *-F* cases are defined.
271 1. *-F \*,...* means apply the filter to all variables in the dataset.
272 2. *-F v1&v2&..,...* means apply the filter to multiple variables.
274 Multiple filters can be specified using the pipeline notions '|'.
277 1. *-F v1&v2,307,9|4,32,32* means apply filter 307 (bzip2) then filter 4 (szip) to the multiple variables.
279 Note that the characters '\*', '\&', and '\|' are shell reserved characters, so you will probably need to escape or quote the filter spec in that environment.
281 As a rule, any input filter on an input variable will be applied to the equivalent output variable — assuming the output file type is netcdf-4.
282 It is, however, sometimes convenient to suppress output compression either totally or on a per-variable basis.
283 Total suppression of output filters can be accomplished by specifying a special case of "-F", namely this.
285 nccopy -F none input.nc output.nc
287 The expression *-F \*,none* is equivalent to *-F none*.
289 Suppression of output filtering for a specific set of variables can be accomplished using these formats.
291 nccopy -F "var,none" input.nc output.nc
292 nccopy -F "v1&v2&...,none" input.nc output.nc
294 where "var" and the "vi" are the fully qualified name of a variable.
296 The rules for all possible cases of the "-F none" flag are defined by this table.
298 <tr><th>-F none<th>-Fvar,...<th>Input Filter<th>Applied Output Filter
299 <tr><td>true<td>undefined<td>NA<td>unfiltered
300 <tr><td>true<td>none<td>NA<td>unfiltered
301 <tr><td>true<td>defined<td>NA<td>use output filter(s)
302 <tr><td>false<td>undefined<td>defined<td>use input filter(s)
303 <tr><td>false<td>none<td>NA<td>unfiltered
304 <tr><td>false<td>defined<td>undefined<td>use output filter(s)
305 <tr><td>false<td>undefined<td>undefined<td>unfiltered
306 <tr><td>false<td>defined<td>defined<td>use output filter(s)
309 ## Filter Specification Syntax {#filters_syntax}
311 The utilities <a href="#NCGEN">ncgen</a> and <a href="#NCCOPY">nccopy</a>, and also the output of *ncdump*, support the specification of filter ids, formats, and parameters in text format.
312 The BNF specification is defined in [Appendix C](#filters_appendixc).
313 Basically, These specifications consist of a filter id, a comma, and then a sequence of
314 comma separated constants representing the parameters.
315 The constants are converted within the utility to a proper set of unsigned int constants (see the <a href="#ParamEncode">parameter encoding section</a>).
317 To simplify things, various kinds of constants can be specified rather than just simple unsigned integers.
318 The *ncgen* and *nccopy* programs will encode them properly using the rules specified in the section on <a href="#filters_paramcoding">parameter encode/decode</a>.
319 Since the original types are lost after encoding, *ncdump* will always show a simple list of unsigned integer constants.
321 The currently supported constants are as follows.
323 <tr halign="center"><th>Example<th>Type<th>Format Tag<th>Notes
324 <tr><td>-17b<td>signed 8-bit byte<td>b|B<td>Truncated to 8 bits and sign extended to 32 bits
325 <tr><td>23ub<td>unsigned 8-bit byte<td>u|U b|B<td>Truncated to 8 bits and zero extended to 32 bits
326 <tr><td>-25S<td>signed 16-bit short<td>s|S<td>Truncated to 16 bits and sign extended to 32 bits
327 <tr><td>27US<td>unsigned 16-bit short<td>u|U s|S<td>Truncated to 16 bits and zero extended to 32 bits
328 <tr><td>-77<td>implicit signed 32-bit integer<td>Leading minus sign and no tag<td>
329 <tr><td>77<td>implicit unsigned 32-bit integer<td>No tag<td>
330 <tr><td>93U<td>explicit unsigned 32-bit integer<td>u|U<td>
331 <tr><td>789f<td>32-bit float<td>f|F<td>
332 <tr><td>12345678.12345678d<td>64-bit double<td>d|D<td>LE encoding
333 <tr><td>-9223372036854775807L<td>64-bit signed long long<td>l|L<td>LE encoding
334 <tr><td>18446744073709551615UL<td>64-bit unsigned long long<td>u|U l|L<td>LE encoding
338 1. In all cases, except for an untagged positive integer, the format tag is required and determines how the constant is converted to one or two unsigned int values.
339 2. For an untagged positive integer, the constant is treated as of the smallest type into which it fits (i.e. 8,16,32, or 64 bit).
340 3. For signed byte and short, the value is sign extended to 32 bits and then treated as an unsigned int value, but maintaining the bit-pattern.
341 4. For double, and signed|unsigned long long, they are converted as specified in the section on <a href="#filters_paramcoding">parameter encode/decode</a>.
342 5. In order to support mutiple filters, the argument to *\_Filter* may be a pipeline separated (using '|') to specify a list of filters specs.
344 ## Dynamic Loading Process {#filters_Process}
346 Each filter is assumed to be compiled into a separate dynamically loaded library.
347 For HDF5 conformant filters, these filter libraries are assumed to be in some specific location.
348 The details for writing such a filter are defined in the HDF5 documentation[1,2].
350 ### Plugin directory {#filters_plugindir}
352 The HDF5 loader searches for plugins in a number of directories.
353 This search is contingent on the presence or absence of the environment
354 variable named ***HDF5_PLUGIN_PATH***.
356 As with all other "...PATH" variables, it is a sequence of absolute
357 directories separated by a separator character. For *nix* operating systems,
358 this separator is the colon (':') character. For Windows and Mingw, the
359 separator is the semi-colon (';') character. So for example:
361 * Linux: export HDF5_PLUGIN_PATH=/usr/lib:/usr/local/lib
362 * Windows: export HDF5_PLUGIN_PATH=c:\\ProgramData\\hdf5\\plugin;c:\\tools\\lib
364 If HDF5_PLUGIN_PATH is defined, then the loader will search each directory
365 in the path from left to right looking for shared libraries with specific
366 exported symbols representing the entry points into the library.
368 If HDF5_PLUGIN_PATH is not defined, the loader defaults to using
369 these default directories:
371 * Linux: /usr/local/hdf5/lib/plugin
372 * Windows: %ALLUSERSPROFILE%\\hdf5\\lib\\plugin
374 It should be noted that there is a difference between the search order
375 for HDF5 versus NCZarr. The HDF5 loader will search only the directories
376 specificed in HDF5_PLUGIN_PATH. In NCZarr, the loader
377 searches HDF5_PLUGIN_PATH and as a last resort,
378 it also searches the default directory.
380 ### Plugin Library Naming {#filters_Pluginlib}
382 Given a plugin directory, HDF5 examines every file in that directory
383 that conforms to a specified name pattern as determined by the
384 platform on which the library is being executed.
387 <tr halign="center"><th>Platform<th>Basename<th>Extension
388 <tr halign="left"><td>Linux<td>lib*<td>.so*
389 <tr halign="left"><td>OSX<td>lib*<td>.dylib*
390 <tr halign="left"><td>Cygwin<td>cyg*<td>.dll*
391 <tr halign="left"><td>Windows<td>*<td>.dll
394 ### Plugin Verification {#filters_Pluginverify}
396 For each dynamic library located using the previous patterns,
397 HDF5 attempts to load the library and attempts to obtain
398 information from it. Specifically, It looks for two functions
399 with the following signatures.
401 1. *H5PL\_type\_t H5PLget\_plugin\_type(void)* — This function is expected to return the constant value *H5PL\_TYPE\_FILTER* to indicate that this is a filter library.
402 2. *const void* H5PLget\_plugin\_info(void)* — This function returns a pointer to a table of type *H5Z\_class2\_t*.
403 This table contains the necessary information needed to utilize the filter both for reading and for writing.
404 In particular, it specifies the filter id implemented by the library and it must match that id specified for the variable in *nc\_def\_var\_filter* in order to be used.
406 If plugin verification fails, then that plugin is ignored and the search continues for another, matching plugin.
408 ## NCZarr Filter Support {#filters_nczarr}
410 The inclusion of Zarr support in the netcdf-c library creates the need to provide a new representation consistent with the way that Zarr files store filter information.
411 For Zarr, filters are represented using the JSON notation.
412 Each filter is defined by a JSON dictionary, and each such filter dictionary
413 is guaranteed to have a key named "id" whose value is a unique string defining the filter algorithm: "lz4" or "bzip2", for example.
415 The parameters of the filter are defined by additional — algorithm specific — keys in the filter dictionary.
416 One commonly used filter is "blosc", which has a JSON dictionary of this form.
425 So it has three parameters:
427 1. "cname" — the sub-algorithm used by the blosc compressor, LZ4 in this case.
428 2. "clevel" — the compression level, 5 in this case.
429 3. "shuffle" — is the input shuffled before compression, yes (1) in this case.
431 NCZarr has four constraints that must be met.
433 1. It must store its filter information in its metadata in the above JSON dictionary format.
434 2. It is required to re-use the HDF5 filter implementations.
435 This is to avoid having to rewrite the filter implementations
436 This means that some mechanism is needed to translate between the HDF5 id+parameter model and the Zarr JSON dictionary model.
437 3. It must be possible to modify the set of visible parameters in response to environment information such as the type of the associated variable; this is required to mimic the corresponding HDF5 capability.
438 4. It must be possible to use filters even if HDF5 support is disabled.
440 Note that the term "visible parameters" is used here to refer to the parameters provided by `nc_def_var_filter` or those stored in the dataset's metadata as provided by the JSON codec. The term "working parameters" refers to the parameters given to the compressor itself and derived from the visible parameters.
442 The standard authority for defining Zarr filters is the list supported by the NumCodecs project [7].
443 Comparing the set of standard filters (aka codecs) defined by NumCodecs to the set of standard filters defined by HDF5 [3], it can be seen that the two sets overlap, but each has filters not defined by the other.
445 Note also that it is undesirable that a specific set of filters/codecs be built into the NCZarr implementation.
446 Rather, it is preferable for there be some extensible way to associate the JSON with the code implementing the codec. This mirrors the plugin model used by HDF5.
448 The mechanism provided to address these issues is similar to that taken by HDF5.
449 A shared library must exist that has certain well-defined entry points that allow the NCZarr code to determine information about a Codec.
450 The shared library exports a well-known function name to access Codec information and relate it to a corresponding HDF5 implementation,
451 Note that the shared library may optionally be the same library containing the HDF5
454 ### Processing Overview
456 There are several paths by which the NCZarr filter API is invoked.
458 1. The nc\_def\_var\_filter function is invoked on a variable or
459 (1a) the metadata for a variable is read when opening an existing variable that has associated Codecs.
460 2. The visible parameters are converted to a set of working parameters.
461 3. The filter is invoked with the working parameters.
462 4. The dataset is closed using the final set of visible parameters.
464 #### Step 1: Invoking nc\_def\_var\_filter
466 In this case, the filter plugin is located and the set of visible parameters (from nc\_def\_var\_filter) are provided.
468 #### Step 1a: Reading metadata
470 In this case, the codec is read from the metadata and must be converted to a visible set of HDF5 style parameters.
471 It is possible that this set of visible parameters differs from the set that was provided by nc\_def\_var\_filter.
472 If this is important, then the filter implementation is responsible for marking this difference using, for example, different number of parameters or some differing value.
474 #### Step 2: Convert visible parameters to working parameters
476 Given environmental information such as the associated variable's base type, the visible parameters
477 are converted to a potentially larger set of working parameters; additionally provide the opportunity
478 to modify the visible parameters.
480 #### Step 3: Invoking the filter
482 As chunks are read or written, the filter is repeatedly invoked using the working parameters.
484 #### Step 4: Closing the dataset
486 The visible parameters from step 2 are stored in the dataset's metadata.
487 It is desirable to determine if the set of visible parameters changes.
488 If no change is detected, then re-writing the compressor metadata may be avoided.
492 Currently, there is no way to specify use of a filter via Codec through
493 the netcdf-c API. Rather, one must know the HDF5 id and parameters of
494 the filter of interest and use the functions *nc\_def\_var\_filter* and *nc\_inq\_var\_filter*.
495 Internally, the NCZarr code will use information about known Codecs to convert the HDF5 filter reference to the corresponding Codec.
496 This restriction also holds for the specification of filters in *ncgen* and *nccopy*.
497 This limitation may be lifted in the future.
499 ### Special Codecs Attribute
501 A new special attribute is defined called *\_Codecs* in parallel to the current *\_Filters* special attribute. Its value is a string containing the JSON representation of the Codecs associated with a given variable.
502 This can be especially useful when a file is unreadable because it uses a filter not available to the netcdf-c library.
503 That is, no implementation was found in the e.g. *HDF5\_PLUGIN\_PATH* directory.
504 In this case *ncdump -hs* will display the raw Codec information so that it may be possible to see what filter is missing.
506 ### Pre-Processing Filter Libraries
508 The process for using filters for NCZarr is defined to operate in several steps.
509 First, as with HDF5, all shared libraries in a specified directory
510 (e.g. *HDF5\_PLUGIN\_PATH*) are scanned.
511 They are interrogated to see what kind of library they implement, if any.
512 This interrogation operates by seeing if certain well-known (function) names are defined in this library.
514 There will be two library types:
516 1. HDF5 — exports a specific API: `H5Z_plugin_type` and `H5Z_get_plugin_info`.
517 2. Codec — exports a specific API: `NCZ_get_codec_info`
519 Note that a given library can export either or both of these APIs.
520 This means that we can have three types of libraries:
526 Suppose that our *HDF5\_PLUGIN\_PATH* location has an HDF5-only library.
527 Then by adding a corresponding, separate, Codec-only library to that same location, it is possible to make an HDF5 library usable by NCZarr.
528 It is possible to do this without having to modify the HDF5-only library.
529 Over time, it is possible to merge an HDF5-only library with a Codec-only library to produce a single, combined library.
531 ### Using Plugin Libraries
533 The netcdf-c library processes all of the shared libraries by interrogating each one for the well-known APIs and recording the result.
534 Any libraries that do not export one or both of the well-known APIs is ignored.
536 Internally, the netcdf-c library pairs up each HDF5 library API with a corresponding Codec API by invoking the relevant well-known functions
537 (See [Appendix E](#filters_appendixe).
538 This results in this table for associated codec and hdf5 libraries.
540 <tr><th>HDF5 API<th>Codec API<th>Action
541 <tr><td>Not defined<td>Not defined<td>Ignore
542 <tr><td>Defined<td>Not defined<td>Ignore
543 <tr><td>Defined<td>Defined<td>NCZarr usable
546 ### Filter Defaults Library
548 As a special case, a shared library may be created to hold
549 defaults for a common set of filters.
550 Basically, there is a specially defined function that returns
551 a vector of codec APIs. These defaults are used only if
552 no other library provides codec information for a filter.
553 Currently, the defaults library provides codec defaults
554 for Shuffle, Fletcher32, Deflate (zlib), and SZIP.
556 ### Using the Codec API
558 Given a set of filters for which the HDF5 API and the Codec API
559 are defined, it is then possible to use the APIs to invoke the
560 filters and to process the meta-data in Codec JSON format.
562 #### Writing an NCZarr Container
564 When writing, the user program will invoke the NetCDF API function *nc\_def\_var\_filter*.
565 This function is currently defined to operate using HDF5-style id and parameters (unsigned ints).
566 The netcdf-c library examines its list of known filters to find one matching the HDF5 id provided by *nc\_def\_var\_filter*.
567 The set of parameters provided is stored internally.
568 Then during writing of data, the corresponding HDF5 filter is invoked to encode the data.
570 When it comes time to write out the meta-data, the stored HDF5-style parameters are passed to a specific Codec function to obtain the corresponding JSON representation. Again see [Appendix E](#filters_appendixe).
571 This resulting JSON is then written in the NCZarr metadata.
573 #### Reading an NCZarr Container
575 When reading, the netcdf-c library will read the metadata for a given variable and will see that some set of filters are applied to this variable.
576 The metadata is encoded as Codec-style JSON.
578 Given a JSON Codec, it is parsed to provide a JSON dictionary containing the string "id" and the set of parameters as various keys.
579 The netcdf-c library examines its list of known filters to find one matching the Codec "id" string.
580 The JSON is passed to a Codec function to obtain the corresponding HDF5-style *unsigned int* parameter vector.
581 These parameters are stored for later use.
583 ### Supporting Filter Chains
585 HDF5 supports *filter chains*, which is a sequence of filters where the output of one filter is provided as input to the next filter in the sequence.
586 When encoding, the filters are executed in the "forward" direction,
587 while when decoding the filters are executed in the "reverse" direction.
589 In the Zarr meta-data, a filter chain is divided into two parts:
590 the "compressor" and the "filters". The former is a single JSON codec
591 as described above. The latter is an ordered JSON array of codecs.
592 So if compressor is something like
593 "compressor": {"id": "c"...}
594 and the filters array is like this:
595 "filters": [ {"id": "f1"...}, {"id": "f2"...}...{"id": "fn"...}]
596 then the filter chain is (f1,f2,...fn,c) with f1 being applied first and c being applied last when encoding. On decode, the filter chain is executed in the order (c,fn...f2,f1).
598 So, an HDF5 filter chain is divided into two parts, where the last filter in the chain is assigned as the "compressor" and the remaining
599 filters are assigned as the "filters".
600 But independent of this, each codec, whether a compressor or a filter,
601 is stored in the JSON dictionary form described earlier.
605 The Codec style, using JSON, has the ability to provide very complex parameters that may be hard to encode as a vector of unsigned integers.
606 It might be desirable to consider exporting a JSON-base API out of the netcdf-c API to support user access to this complexity.
607 This would mean providing some alternate version of `nc_def_var_filter` that takes a string-valued argument instead of a vector of unsigned ints.
608 This extension is unlikely to be implemented until a compelling use-case is encountered.
610 One bad side-effect of this is that we then may have two classes of plugins.
611 One class can be used by both HDF5 and NCZarr, and a second class that is usable only with NCZarr.
613 ### Using The NetCDF-C Plugins
615 As part of its testing, the NetCDF build process creates a number of shared libraries in the *netcdf-c/plugins* (or sometimes *netcdf-c/plugins/.libs*) directory.
616 If you need a filter from that set, you may be able to set *HDF5\_PLUGIN\_PATH*
617 to point to that directory or you may be able to copy the shared libraries out of that directory to your own location.
619 ## Debugging {#filters_debug}
621 Depending on the debugger one uses, debugging plugins can be very difficult.
622 It may be necessary to use the old printf approach for debugging the filter itself.
624 One case worth mentioning is when there is a dataset that is using an unknown filter.
625 For this situation, you need to identify what filter(s) are used in the dataset.
626 This can be accomplished using this command.
628 ncdump -s -h <dataset filename>
630 Since ncdump is not being asked to access the data (the -h flag), it can obtain the filter information without failures.
631 Then it can print out the filter id and the parameters as well as the Codecs (via the -s flag).
633 ### Test Cases {#filters_TestCase}
635 Within the netcdf-c source tree, the directory two directories contain test cases for testing dynamic filter operation.
637 * *netcdf-c/nc\_test4* provides tests for testing HDF5 filters.
638 * *netcdf-c/nczarr\_test* provides tests for testing NCZarr filters.
640 These tests are disabled if *--disable-shared* or if *--disable-filter-tests* is specified
641 or if *--disable-plugins* is specified.
643 ### HDF5 Example {#filters_Example}
645 A slightly simplified version of one of the HDF5 filter test cases is also available as an example within the netcdf-c source tree directory *netcdf-c/examples/C*.
646 The test is called *filter\_example.c* and it is executed as part of the *run\_examples4.sh* shell script.
647 The test case demonstrates dynamic filter writing and reading.
649 The files *example/C/hdf5plugins/Makefile.am* and *example/C/hdf5plugins/CMakeLists.txt* demonstrate how to build the hdf5 plugin for bzip2.
653 ### Order of Invocation for Multiple Filters
655 When multiple filters are defined on a variable, the order of application, when writing data to the file, is same as the order in which *nc\_def\_var\_filter*is called.
656 When reading a file the order of application is of necessity the reverse.
658 There are some special cases.
660 1. The fletcher32 filter is always applied first, if enabled.
661 2. If *nc\_def\_var\_filter*or *nc\_def\_var\_deflate*or *nc\_def\_var\_szip*is called multiple times with the same filter id, but possibly with different sets of parameters, then the position of that filter in the sequence of applictions does not change.
662 However the last set of parameters specified is used when actually writing the dataset.
663 3. Deflate and shuffle — these two are inextricably linked in the current API, but have quite different semantics.
664 If you call *nc\_def\_var\_deflate*multiple times, then the previous rule applies with respect to deflate.
665 However, the shuffle filter, if enabled, is *always* applied before applying any other filters, except fletcher32.
666 4. Once a filter is defined for a variable, it cannot be removed nor can its position in the filter order be changed.
668 ### Memory Allocation Issues
670 Starting with HDF5 version 1.10.*, the plugin code MUST be careful when using the standard *malloc()*, *realloc()*, and *free()* function.
672 In the event that the code is allocating, reallocating, for
673 free'ing memory that either came from or will be exported to the
674 calling HDF5 library, then one MUST use the corresponding HDF5
675 functions *H5allocate\_memory()*, *H5resize\_memory()*,
676 *H5free\_memory()* [5] to avoid memory failures.
678 Additionally, if your filter code leaks memory, then the HDF5 library generates a failure something like this.
680 H5MM.c:232: H5MM_final_sanity_check: Assertion `0 == H5MM_curr_alloc_bytes_s' failed.
682 One can look at the the code in plugins/H5Zbzip2.c and H5Zmisc.c as illustrations.
686 The current szip plugin code in the HDF5 library has some behaviors that can catch the unwary.
687 These are handled internally to (mostly) hide them so that they should not affect users.
688 Specifically, this filter may do two things.
690 1. Add extra parameters to the filter parameters: going from the two parameters provided by the user to four parameters for internal use.
691 It turns out that the two parameters provided when calling nc\_def\_var\_filter correspond to the first two parameters of the four parameters returned by nc\_inq\_var\_filter.
692 2. Change the values of some parameters: the value of the *options\_mask* argument is known to add additional flag bits, and the *pixels\_per\_block* parameter may be modified.
694 The reason for these changes is has to do with the fact that the szip API provided by the underlying H5Pset\_szip function is actually a subset of the capabilities of the real szip implementation.
695 Presumably this is for historical reasons.
697 In any case, if the caller uses the *nc\_inq\_var\_szip* or the *nc\_inq\_var\_filter* functions, then the parameter values returned may differ from those originally specified.
699 It should also be noted that the HDF5 szip filter wrapper that
700 is invoked depends on the configuration of the netcdf-c library.
701 If the HDF5 installation supports szip, then the NCZarr szip
702 will use the HDF5 wrapper. If HDF5 does not support szip, or HDF5
703 is not enabled, then the plugins directory will contain a local
704 HDF5 szip wrapper to be used by NCZarr. This can be confusing,
705 but is generally transparent to the use since the plugins
706 HDF5 szip wrapper was taken from the HDF5 code base.
708 ### Supported Systems
710 The current matrix of OS X build systems known to work is as follows.
712 <tr><th>Build System<th>Supported OS
713 <tr><td>Automake<td>Linux, Cygwin, OSX
714 <tr><td>Cmake<td>Linux, Cygwin, OSX, Visual Studio
717 ### Generic Plugin Build
718 If you do not want to use Automake or Cmake, the following has been known to work.
720 gcc -g -O0 -shared -o libbzip2.so <plugin source files> -L${HDF5LIBDIR} -lhdf5\_hl -lhdf5 -L${ZLIBDIR} -lz
722 ## References {#filters_References}
724 1. https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf
725 2. https://support.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf
726 3. https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins
727 4. https://support.hdfgroup.org/services/contributions.html#filters
728 5. https://support.hdfgroup.org/HDF5/doc/RM/RM\_H5.html
729 6. https://confluence.hdfgroup.org/display/HDF5/Filters
730 7. https://numcodecs.readthedocs.io/en/stable/
731 8. https://github.com/ccr/ccr
732 9. https://escholarship.org/uc/item/7xd1739k
734 ## Appendix A. HDF5 Parameter Encode/Decode {#filters_appendixa}
736 The filter id for an HDF5 format filter is an unsigned integer.
737 Further, the parameters passed to an HDF5 format filter are encoded internally as a vector of 32-bit unsigned integers.
738 It may be that the parameters required by a filter can naturally be encoded as unsigned integers.
739 The bzip2 compression filter, for example, expects a single integer value from zero thru nine.
740 This encodes naturally as a single unsigned integer.
742 Note that signed integers and single-precision (32-bit) float values also can easily be represented as 32 bit unsigned integers by proper casting to an unsigned integer so that the bit pattern is preserved.
743 Simple signed integer values of type short or char can also be mapped to an unsigned integer by truncating to 16 or 8 bits respectively and then sign extending. Similarly, unsigned 8 and 16 bit
744 values can be used with zero extensions.
746 Machine byte order (aka endian-ness) is an issue for passing some kinds of parameters.
747 You might define the parameters when compressing on a little endian machine, but later do the decompression on a big endian machine.
749 When using HDF5 format filters, byte order is not an issue for 32-bit values because HDF5 takes care of converting them between the local machine byte order and network byte order.
751 Parameters whose size is larger than 32-bits present a byte order problem.
752 This specifically includes double precision floats and (signed or unsigned) 64-bit integers.
753 For these cases, the machine byte order issue must be handled, in part, by the compression code.
754 This is because HDF5 will treat, for example, an unsigned long long as two 32-bit unsigned integers and will convert each to network order separately.
755 This means that on a machine whose byte order is different than the machine in which the parameters were initially created, the two integers will be separately
757 But this will be incorrect for 64-bit values.
759 So, we have this situation (for HDF5 only):
761 1. the 8 bytes start as native machine order for the machine doing the call to *nc\_def\_var\_filter*.
762 2. The caller divides the 8 bytes into 2 four byte pieces and passes them to *nc\_def\_var\_filter*.
763 3. HDF5 takes each four byte piece and ensures that each piece is in network (big) endian order.
764 4. When the filter is called, the two pieces are returned in the same order but with the bytes in each piece consistent with the native machine order for the machine executing the filter.
766 ### Encoding Algorithms for HDF5
768 In order to properly extract the correct 8-byte value, we need to ensure that the values stored in the HDF5 file have a known format independent of the native format of the creating machine.
770 The idea is to do sufficient manipulation so that HDF5 will store the 8-byte value as a little endian value divided into two 4-byte integers.
771 Note that little-endian is used as the standard because it is the most common machine format.
772 When read, the filter code needs to be aware of this convention and do the appropriate conversions.
774 This leads to the following set of rules.
778 1. Encode on little endian (LE) machine: no special action is required.
779 The 8-byte value is passed to HDF5 as two 4-byte integers.
780 HDF5 byte swaps each integer and stores it in the file.
781 2. Encode on a big endian (BE) machine: several steps are required:
783 1. Do an 8-byte byte swap to convert the original value to little-endian format.
784 2. Since the encoding machine is BE, HDF5 will just store the value.
785 So it is necessary to simulate little endian encoding by byte-swapping each 4-byte integer separately.
786 3. This doubly swapped pair of integers is then passed to HDF5 and is stored unchanged.
790 1. Decode on LE machine: no special action is required.
791 HDF5 will get the two 4-bytes values from the file and byte-swap each separately.
792 The concatenation of those two integers will be the expected LE value.
793 2. Decode on a big endian (BE) machine: the inverse of the encode case must be implemented.
795 1. HDF5 sends the two 4-byte values to the filter.
796 2. The filter must then byte-swap each 4-byte value independently.
797 3. The filter then must concatenate the two 4-byte values into a single 8-byte value.
798 Because of the encoding rules, this 8-byte value will be in LE format.
799 4. The filter must finally do an 8-byte byte-swap on that 8-byte value to convert it to desired BE format.
801 To support these rules, some utility programs exist and are discussed in [Appendix B](#filters_appendixb).
803 ## Appendix B. Support Utilities {#filters_appendixb}
805 Several functions are exported from the netcdf-c library for use by client programs and by filter implementations.
806 They are defined in the header file *netcdf\_aux.h*.
807 The h5 tag indicates that they assume that the result of the parse is a set of unsigned integers — the format used by HDF5.
809 1. *int ncaux\_h5filterspec\_parse(const char* txt, unsigned int* idp. size\_t* nparamsp, unsigned int** paramsp);*
810 * txt contains the text of a sequence of comma separated constants
811 * idp will contain the first constant — the filter id
812 * nparamsp will contain the number of params
813 * paramsp will contain a vector of params — the caller must free
814 This function can parse single filter spec strings as defined in the section on [Filter Specification Syntax](#filters_syntax).
815 2. *int ncaux\_h5filterspec\_parselist(const char* txt, int* formatp, size\_t* nspecsp, struct NC\_H5\_Filterspec*** vectorp);*
816 * txt contains the text of a sequence '|' separated filter specs.
817 * formatp currently always returns 0.
818 * nspecsp will return the number of filter specifications.
819 * vectorp will return a pointer to a vector of pointers to filter specification instances — the caller must free.
820 This function parses a sequence of filter specifications each separated by a '|' character.
821 The text between '|' separators must be parsable by *ncaux\_h5filterspec\_parse*.
822 3. *void ncaux\_h5filterspec\_free(struct NC\_H5\_Filterspec* f);*
823 * f is a pointer to an instance of *struct NC\_H5\_Filterspec*
824 Typically this was returned as an element of the vector returned
825 by *\_ncaux\_h5filterspec\_parselist*.
826 This reclaims the parameters of the filter spec object as well as the object itself.
827 4. *int ncaux\_h5filterspec\_fix8(unsigned char* mem8, int decode);*
828 * mem8 is a pointer to the 8-byte value either to fix.
829 * decode is 1 if the function should apply the 8-byte decoding algorithm
830 else apply the encoding algorithm.
831 This function implements the 8-byte conversion algorithms for HDF5.
832 Before calling *nc\_def\_var\_filter* (unless *NC\_parsefilterspec* was used), the client must call this function with the decode argument set to 0.
833 Inside the filter code, this function should be called with the decode argument set to 1.
835 Examples of the use of these functions can be seen in the test program *nc\_test4/tst\_filterparser.c*.
837 Some of the above functions use a C struct defined in *netcdf\_filter.h\_.
838 The definition of that struct is as follows.
840 typedef struct NC_H5_Filterspec {
841 unsigned int filterid; /* ID for arbitrary filter. */
842 size_t nparams; /* nparams for arbitrary filter. */
843 unsigned int* params; /* Params for arbitrary filter. */
846 This struct in effect encapsulates all of the information about and HDF5 formatted filter — the id, the number of parameters, and the parameters themselves.
848 ## Appendix C. Build Flags for Detecting the Filter Mechanism {#filters_appendixc}
850 The include file *netcdf\_meta.h* contains the following definition.
852 #define NC_HAS_MULTIFILTERS 1
854 This, in conjunction with the error code *NC\_ENOFILTER* in *netcdf.h* can be used to see what filter mechanism is in place as described in the section on [incompatibities](#filters_compatibility).
856 1. !defined(NC\_ENOFILTER) && !defined(NC\_HAS\_MULTIFILTERS) — indicates that the old pre-4.7.4 mechanism is in place.
857 It does not support multiple filters.
858 2. defined(NC\_ENOFILTER) && !defined(NC\_HAS\_MULTIFILTERS) — indicates that the 4.7.4 mechanism is in place.
859 It does support multiple filters, but the error return codes for *nc\_inq\_var\_filter* are different and the filter spec parser functions are in a different location with different names.
860 3. defined(NC\_ENOFILTER) && defined(NC\_HAS\_MULTIFILTERS) — indicates that the multiple filters are supported, and that *nc\_inq\_var\_filter* returns a filterid of zero to indicate that a variable has no filters.
861 Also, the filter spec parsers have the names and signatures described in this document and are define in *netcdf\_aux.h*.
863 ## Appendix D. BNF for Specifying Filters in Utilities {#filters_appendixd}
870 | filterid ',' parameterlist
874 parameterlist: parameter
875 | parameterlist ',' parameter
877 parameter: unsigned32
880 unsigned32: <32 bit unsigned integer>
883 ## Appendix E. Codec API {#filters_appendixe}
885 The Codec API mirrors the HDF5 API closely. It has one well-known function that can be invoked to obtain information about the Codec as well as pointers to special functions to perform conversions.
887 ### The Codec Plugin API
889 #### NCZ\_get\_codec\_info
891 This function returns a pointer to a C struct that provides detailed information about the codec plugin.
895 void* NCZ_get_codec_info(void);
897 The value returned is actually of type *struct NCZ\_codec\_t*,
898 but is of type *void\** to allow for extensions.
902 typedef struct NCZ_codec_t {
903 int version; /* Version number of the struct */
904 int sort; /* Format of remainder of the struct;
905 Currently always NCZ_CODEC_HDF5 */
906 const char* codecid; /* The name/id of the codec */
907 unsigned int hdf5id; /* corresponding hdf5 id */
908 void (*NCZ_codec_initialize)(void);
909 void (*NCZ_codec_finalize)(void);
910 int (*NCZ_codec_to_hdf5)(const char* codec, int* nparamsp, unsigned** paramsp);
911 int (*NCZ_hdf5_to_codec)(size_t nparams, const unsigned* params, char** codecp);
912 int (*NCZ_modify_parameters)(int ncid, int varid, size_t* vnparamsp, unsigned** vparamsp, size_t* nparamsp, unsigned** paramsp);
916 The semantics of the non-function fields is as follows:
918 1. *version* — Version number of the struct.
919 2. *sort* — Format of remainder of the struct; currently always NCZ\_CODEC\_HDF5.
920 3. *codecid* — The name/id of the codec.
921 4. *hdf5id* — The corresponding hdf5 id.
923 #### NCZ\_codec\_to\_hdf5
925 Given a JSON Codec representation, it will return a corresponding vector of unsigned integers representing the
930 int NCZ_codec_to_hdf(const char* codec, int* nparamsp, unsigned** paramsp);
933 1. codec — (in) ptr to JSON string representing the codec.
934 2. nparamsp — (out) store the length of the converted HDF5 unsigned vector
935 3. paramsp — (out) store a pointer to the converted HDF5 unsigned vector; caller must free the returned vector. Note the double indirection.
937 Return Value: a netcdf-c error code.
939 #### NCZ\_hdf5\_to\_codec
941 Given an HDF5 visible parameters vector of unsigned integers and its length,
942 return a corresponding JSON codec representation of those visible parameters.
946 int NCZ_hdf5_to_codec)(int ncid, int varid, size_t nparams, const unsigned* params, char** codecp);
950 1. ncid — the variables' containing group
951 2. varid — the containing variable
952 3. nparams — (in) the length of the HDF5 visible parameters vector
953 4. params — (in) pointer to the HDF5 visible parameters vector.
954 5. codecp — (out) store the string representation of the codec; caller must free.
956 Return Value: a netcdf-c error code.
958 #### NCZ\_modify\_parameters
960 Extract environment information from the (ncid,varid) and use it to convert a set of visible parameters
961 to a set of working parameters; also provide option to modify visible parameters.
965 int NCZ_modify_parameters(int ncid, int varid, size_t* vnparamsp, unsigned** vparamsp, size_t* wnparamsp, unsigned** wparamsp);
969 1. ncid — (in) group id containing the variable.
970 2. varid — (in) the id of the variable to which this filter is being attached.
971 3. vnparamsp — (in/out) the count of visible parameters
972 4. vparamsp — (in/out) the set of visible parameters
973 5. wnparamsp — (out) the count of working parameters
974 4. wparamsp — (out) the set of working parameters
976 Return Value: a netcdf-c error code.
978 #### NCZ\_codec\_initialize
980 Some compressors may require library initialization.
981 This function is called as soon as a shared library is loaded and matched with an HDF5 filter.
985 int NCZ_codec_initialize)(void);
987 Return Value: a netcdf-c error code.
989 #### NCZ\_codec\_finalize
991 Some compressors (like blosc) require invoking a finalize function in order to avoid memory loss.
992 This function is called during a call to *nc\_finalize* to do any finalization.
993 If the client code does not invoke *nc\_finalize* then memory checkers may complain about lost memory.
997 int NCZ_codec_finalize)(void);
999 Return Value: a netcdf-c error code.
1003 As an aid to clients, it is convenient if a single shared library can provide multiple *NCZ\_code\_t* instances at one time.
1004 This API is not intended to be used by plugin developers.
1005 A shared library must only export this function.
1007 #### NCZ\_codec\_info\_defaults
1009 Return a NULL terminated vector of pointers to instances of *NCZ\_codec\_t*.
1013 void* NCZ_codec_info_defaults(void);
1015 The value returned is actually of type *NCZ\_codec\_t***,
1016 but is of type *void** to allow for extensions.
1017 The list of returned items are used to try to provide defaults
1018 for any HDF5 filters that have no corresponding Codec.
1019 This is for internal use only.
1021 ## Appendix F. Standard Filters {#filters_appendixf}
1023 Support for a select set of standard filters is built into the NetCDF API.
1024 Generally, they are accessed using the following generic API, where XXXX is
1025 the filter name. As a rule, the names are those used in the HDF5 filter ID naming authority [4] or the NumCodecs naming authority [7].
1027 int nc_def_var_XXXX(int ncid, int varid, unsigned filterid, size_t nparams, unsigned* params);
1028 int nc_inq_var_XXXX(int ncid, int varid, int* hasfilter, size_t* nparamsp, unsigned* params);
1030 The first function inserts the specified filter into the filter chain for a given variable.
1031 The second function queries the given variable to see if the specified function
1032 is in the filter chain for that variable. The *hasfilter* argument is set
1033 to one if the filter is in the chain and zero otherwise.
1034 As is usual with the netcdf API, one is expected to call this function twice.
1035 The first time to set *nparamsp* and the second to get the parameters in the client-allocated memory argument *params*.
1036 Any of these arguments can be NULL, in which case no value is returned.
1038 Note that NetCDF inherits four filters from HDF5, namely shuffle, fletcher32, deflate (zlib), and szip. The API's for these do not conform to the above API.
1039 So aside from those four, the current set of standard filters is as follows.
1041 <tr><th>Filter Name<th>Filter ID<th>Reference
1042 <tr><td>zstandard<td>32015<td>https://facebook.github.io/zstd/
1043 <tr><td>bzip2<td>307<td>https://sourceware.org/bzip2/
1046 It is important to note that in order to use each standard filter, several additonal libraries must be installed.
1047 Consider the zstandard compressor, which is one of the supported standard filters.
1048 When installing the netcdf library, the following other libraries must be installed.
1050 1. *libzstd.so* | *zstd.dll* | *libzstd.dylib* -- The actual zstandard compressor library; typically installed by using your platform specific package manager.
1051 2. The HDF5 wrapper for *libzstd.so* -- There are several options for obtaining this (see [Appendix G](#filters_appendixg).)
1052 3. (Optional) The Zarr wrapper for *libzstd.so* -- you need this if you intend to read/write Zarr datasets that were compressed using zstandard; again see [Appendix G](#filters_appendixg).
1054 ## Appendix G. Finding Filters {#filters_appendixg}
1056 A major problem for filter users is finding an implementation of an HDF5 filter wrapper and (optionally)
1057 its corresponding NCZarr wrapper. There are several ways to do this.
1059 * **--with-plugin-dir** — An option to *./configure* that will install the necessary wrappers.
1060 See [Appendix H](#filters_appendixh).
1062 * **HDF5 Assigned Filter Identifiers Repository [3]** —
1063 HDF5 maintains a page of standard filter identifiers along with
1064 additional contact information. This often includes a pointer
1065 to source code. This will provide only HDF5 wrappers and not NCZarr wrappers.
1067 * **Community Codec Repository** —
1068 The Community Codec Repository (CCR) project [8] provides
1069 filters, including HDF5 wrappers, for a number of filters.
1070 It does not as yet provide Zarr wrappers.
1071 You can install this library to get access to these supported filters.
1072 It does not currently include the required NCZarr Codec API,
1073 so they are only usable with netcdf-4. This will change in the future.
1075 ## Appendix H. Auto-Install of Filter Wrappers {#filters_appendixh}
1077 As part of the overall build process, a number of filter wrappers are built as shared libraries in the "plugins" directory.
1078 These wrappers can be installed as part of the overall netcdf-c installation process.
1079 WARNING: the installer still needs to make sure that the actual filter/compression libraries are installed: e.g. libzstd and/or libblosc.
1081 The target location into which libraries in the "plugins" directory are installed is specified
1082 using a special *./configure* option
1084 --with-plugin-dir=<directorypath>
1088 or its corresponding *cmake* option.
1090 -DPLUGIN_INSTALL_DIR=<directorypath>
1092 -DPLUGIN_INSTALL_DIR=YES
1094 This option defaults to the value "yes", which means that filters are
1095 installed by default. This can be disabled by one of the following options.
1097 --without-plugin-dir (automake)
1099 --with-plugin-dir=no (automake)
1101 -DPLUGIN_INSTALL_DIR=NO (CMake)
1104 If the option is specified with no argument (automake) or with the value "YES" (CMake),
1105 then it defaults (in order) to the following directories:
1106 1. If the HDF5_PLUGIN_PATH environment variable is defined, then last directory in the list of directories in the path is used.
1107 2. (a) "/usr/local/hdf5/lib/plugin” for linux/unix operating systems (including Cygwin)<br>
1108 (b) “%ALLUSERSPROFILE%\\hdf5\\lib\\plugin” for Windows and MinGW
1110 If NCZarr is enabled, then in addition to wrappers for the standard filters,
1111 additional libraries will be installed to support NCZarr access to filters.
1112 Currently, this list includes the following:
1113 * shuffle — shuffle filter
1114 * fletcher32 — fletcher32 checksum
1115 * deflate — deflate compression
1116 * (optional) szip — szip compression, if libsz is available
1117 * bzip2 — an HDF5 filter for bzip2 compression
1118 * lib__nczh5filters.so — provide NCZarr support for shuffle, fletcher32, deflate, and (optionally) szip.
1119 * lib__nczstdfilters.so — provide NCZarr support for bzip2, (optionally)zstandard, and (optionally) blosc.
1121 The shuffle, fletcher32, and deflate filters in this case will
1122 be ignored by HDF5 and only used by the NCZarr code. But in
1123 order to use them, it needs additional Codec capabilities
1124 provided by the *lib__nczh5filters.so* shared library. Note also that
1125 if you disable HDF5 support, but leave NCZarr support enabled,
1126 then all of the above filters should continue to work.
1128 ### HDF5_PLUGIN_PATH
1130 At the moment, NetCDF uses the existing HDF5 environment variable
1131 *HDF5\_PLUGIN\_PATH* to locate the directories in which filter wrapper
1132 shared libraries are located. This is used both for the HDF5 filter
1133 wrappers but also the NCZarr codec wrappers.
1135 *HDF5\_PLUGIN\_PATH* is a typical Windows or Unix style
1136 path-list. That is it is a sequence of absolute directory paths
1137 separated by a specific separator character. For Windows, the
1138 separator character is a semicolon (';') and for Unix, it is a a
1141 So, if HDF5_PLUGIN_PATH is defined at build time, and
1142 *--with-plugin-dir* is specified with no argument then the last
1143 directory in the path will be the one into which filter wrappers are
1144 installed. Otherwise the default directories are used.
1146 The important thing to note is that at run-time, there are several cases to consider:
1148 1. HDF5_PLUGIN_PATH is defined and is the same value as it was at build time -- no action needed
1149 2. HDF5_PLUGIN_PATH is defined and is has a different value from build time -- the user is responsible for ensuring that the run-time path includes the same directory used at build time, otherwise this case will fail.
1150 3. HDF5_PLUGIN_DIR is not defined at either run-time or build-time -- no action needed
1151 4. HDF5_PLUGIN_DIR is not defined at run-time but was defined at build-time -- this will probably fail
1153 ## Point of Contact {#filters_poc}
1155 *Author*: Dennis Heimbigner<br>
1156 *Email*: dmh at ucar dot edu<br>
1157 *Initial Version*: 1/10/2018<br>
1158 *Last Revised*: 5/18/2022