NetCDF  4.9.2
dispatch.md
1 Internal Dispatch Table Architecture
2 ============================
3 <!-- double header is needed to workaround doxygen bug -->
4 
5 # Internal Dispatch Table Architecture
6 
7 \tableofcontents
8 
9 # Introduction {#dispatch_intro}
10 
11 The netcdf-c library uses an internal dispatch mechanism
12 as the means for wrapping the netcdf-c API around a wide variety
13 of underlying storage and stream data formats.
14 As of last check, the following formats are supported and each
15 has its own dispatch table.
16 
17 Warning: some of the listed function signatures may be out of date
18 and the specific code should be consulted to see the actual parameters.
19 
20 <table>
21 <tr><th>Format<td>Directory<th>NC_FORMATX Name
22 <tr><td>NetCDF-classic<td>libsrc<td>NC_FORMATX_NC3
23 <tr><td>NetCDF-enhanced<td>libhdf5<td>NC_FORMATX_NC_HDF5
24 <tr><td>HDF4<td>libhdf4<td>NC_FORMATX_NC_HDF4
25 <tr><td>PNetCDF<td>libsrcp<td>NC_FORMATX_PNETCDF
26 <tr><td>DAP2<td>libdap2<td>NC_FORMATX_DAP2
27 <tr><td>DAP4<td>libdap4<td>NC_FORMATX_DAP4
28 <tr><td>UDF0<td>N.A.<td>NC_FORMATX_UDF0
29 <tr><td>UDF1<td>N.A.<td>NC_FORMATX_UDF1
30 <tr><td>NCZarr<td>libnczarr<td>NC_FORMATX_NCZARR
31 </table>
32 
33 Note that UDF0 and UDF1 allow for user-defined dispatch tables to
34 be implemented.
35 
36 The idea is that when a user opens or creates a netcdf file, a
37 specific dispatch table is chosen. A dispatch table is a struct
38 containing an entry for (almost) every function in the netcdf-c API.
39 During execution, netcdf API calls are channeled through that
40 dispatch table to the appropriate function for implementing that
41 API call. The functions in the dispatch table are not quite the
42 same as those defined in *netcdf.h*. For simplicity and
43 compactness, some netcdf.h API calls are mapped to the same
44 dispatch table function. In addition to the functions, the first
45 entry in the table defines the model that this dispatch table
46 implements. It will be one of the NC_FORMATX_XXX values.
47 The second entry in the table is the version of the dispatch table.
48 The rule is that previous entries may not be removed, but new entries
49 may be added, and adding new entries increases the version number.
50 
51 The dispatch table represents a distillation of the netcdf API down to
52 a minimal set of internal operations. The format of the dispatch table
53 is defined in the file *libdispatch/ncdispatch.h*. Every new dispatch
54 table must define this minimal set of operations.
55 
56 # Adding a New Dispatch Table
57 In order to make this process concrete, let us assume we plan to add
58 an in-memory implementation of netcdf-3.
59 
60 ## Defining configure.ac flags
61 
62 Define a *–-enable* flag option for *configure.ac*. For our
63 example, we assume the option "--enable-ncm" and the
64 internal corresponding flag "enable_ncm". If you examine the existing
65 *configure.ac* and see how, for example, *--enable_dap2* is
66 defined, then it should be clear how to do it for your code.
67 
68 ## Defining a "name space"
69 
70 Choose some prefix of characters to identify the new dispatch
71 system. In effect we are defining a name-space. For our in-memory
72 system, we will choose "NCM" and "ncm". NCM is used for non-static
73 procedures to be entered into the dispatch table and ncm for all other
74 non-static procedures. Note that the chosen prefix should probably start
75 with "nc" or "NC" in order to avoid name conflicts outside the netcdf-c library.
76 
77 ## Extend include/netcdf.h
78 
79 Modify the file *include/netcdf.h* to add an NC_FORMATX_XXX flag
80 by adding a flag for this dispatch format at the appropriate places.
81 ````
82  #define NC_FORMATX_NCM 7
83 ````
84 
85 Add any format specific new error codes.
86 ````
87 #define NC_ENCM (?)
88 ````
89 
90 ## Extend include/ncdispatch.h
91 
92 Modify the file *include/ncdispatch.h* to
93 add format specific data and initialization functions;
94 note the use of our NCM namespace.
95 ````
96  #ifdef ENABLE_NCM
97  extern NC_Dispatch* NCM_dispatch_table;
98  extern int NCM_initialize(void);
99  #endif
100 ````
101 
102 ## Define the dispatch table functions
103 
104 Define the functions necessary to fill in the dispatch table. As a
105 rule, we assume that a new directory is defined, *libsrcm*, say. Within
106 this directory, we need to define *Makefile.am* and *CMakeLists.txt*.
107 We also need to define the source files
108 containing the dispatch table and the functions to be placed in the
109 dispatch table -– call them *ncmdispatch.c* and *ncmdispatch.h*. Look at
110 *libsrc/nc3dispatch.[ch]* or *libnczarr/zdispatch.[ch]* for examples.
111 
112 Similarly, it is best to take existing *Makefile.am* and *CMakeLists.txt*
113 files (from *libsrcp* for example) and modify them.
114 
115 ## Adding the dispatch code to libnetcdf
116 
117 Provide for the inclusion of this library in the final libnetcdf
118 library. This is accomplished by modifying *liblib/Makefile.am* by
119 adding something like the following.
120 ````
121  if ENABLE_NCM
122  libnetcdf_la_LIBADD += $(top_builddir)/libsrcm/libnetcdfm.la
123  endif
124 ````
125 
126 ## Extend library initialization
127 
128 Modify the *NC_initialize* function in *liblib/nc_initialize.c* by adding
129 appropriate references to the NCM dispatch function.
130 ````
131  #ifdef ENABLE_NCM
132  extern int NCM_initialize(void);
133  #endif
134  ...
135  int NC_initialize(void)
136  {
137  ...
138  #ifdef ENABLE_NCM
139  if((stat = NCM_initialize())) return stat;
140  #endif
141  ...
142  }
143 ````
144 
145 Finalization is handled in an analogous fashion.
146 
147 ## Testing the new dispatch table
148 
149 Add a directory of tests: *ncm_test*, say. The file *ncm_test/Makefile.am*
150 will look something like this.
151 ````
152  # These files are created by the tests.
153  CLEANFILES = ...
154  # These are the tests which are always run.
155  TESTPROGRAMS = test1 test2 ...
156  test1_SOURCES = test1.c ...
157  ...
158  # Set up the tests.
159  check_PROGRAMS = $(TESTPROGRAMS)
160  TESTS = $(TESTPROGRAMS)
161  # Any extra files required by the tests
162  EXTRA_DIST = ...
163 ````
164 
165 # Top-Level build of the dispatch code
166 
167 Provide for *libnetcdfm* to be constructed by adding the following to
168 the top-level *Makefile.am*.
169 
170 ````
171  if ENABLE_NCM
172  NCM=libsrcm
173  NCMTESTDIR=ncm_test
174  endif
175  ...
176  SUBDIRS = ... $(DISPATCHDIR) $(NCM) ... $(NCMTESTDIR)
177 ````
178 
179 # Choosing a Dispatch Table
180 
181 The dispatch table is ultimately chosen by the function
182 NC_infermodel() in libdispatch/dinfermodel.c. This function is
183 invoked by the NC_create and the NC_open procedures. This can
184 be, unfortunately, a complex process. The detailed operation of
185 NC_infermodel() is defined in the companion document in docs/dinternal.md.
186 
187 In any case, the choice of dispatch table is currently based on the following
188 pieces of information.
189 
190 1. The mode argument – this can be used to detect, for example, what kind
191 of file to create: netcdf-3, netcdf-4, 64-bit netcdf-3, etc.
192 Using a mode flag is the most common mechanism, in which case
193 *netcdf.h* needs to be modified to define the relevant mode flag.
194 
195 2. The file path – this can be used to detect, for example, a DAP url
196 versus a normal file system file. If the path looks like a URL, then
197 the fragment part of the URL is examined to determine the specific
198 dispatch function.
199 
200 3. The file contents - when the contents of a real file are available,
201 the contents of the file can be used to determine the dispatch table.
202 As a rule, this is likely to be useful only for *nc_open*.
203 
204 4. If the file is being opened vs being created.
205 
206 5. Is parallel IO available?
207 
208 The *NC_infermodel* function returns two values.
209 
210 1. model - this is used by nc_open and nc_create to choose the dispatch table.
211 2. newpath - in some case, usually URLS, the path may be rewritten to include extra information for use by the dispatch functions.
212 
213 # Special Dispatch Table Signatures.
214 
215 The entries in the dispatch table do not necessarily correspond
216 to the external API. In many cases, multiple related API functions
217 are merged into a single dispatch table entry.
218 
219 ## Create/Open
220 
221 The create table entry and the open table entry in the dispatch table
222 have the following signatures respectively.
223 ````
224  int (*create)(const char *path, int cmode,
225  size_t initialsz, int basepe, size_t *chunksizehintp,
226  int useparallel, void* parameters,
227  struct NC_Dispatch* table, NC* ncp);
228 
229  int (*open)(const char *path, int mode,
230  int basepe, size_t *chunksizehintp,
231  int use_parallel, void* parameters,
232  struct NC_Dispatch* table, NC* ncp);
233 ````
234 
235 The key difference is that these are the union of all the possible
236 create/open signatures from the include/netcdfXXX.h files. Note especially the last
237 three parameters. The parameters argument is a pointer to arbitrary data
238 to provide extra info to the dispatcher.
239 The table argument is included in case the create
240 function (e.g. *NCM_create_) needs to invoke other dispatch
241 functions. The very last argument, ncp, is a pointer to an NC
242 instance. The raw NC instance will have been created by *libdispatch/dfile.c*
243 and is passed to e.g. open with the expectation that it will be filled in
244 by the dispatch open function.
245 
246 ## Accessing Data with put_vara() and get_vara()
247 
248 ````
249  int (*put_vara)(int ncid, int varid, const size_t *start, const size_t *count,
250  const void *value, nc_type memtype);
251 ````
252 
253 ````
254  int (*get_vara)(int ncid, int varid, const size_t *start, const size_t *count,
255  void *value, nc_type memtype);
256 ````
257 
258 Most of the parameters are similar to the netcdf API parameters. The
259 last parameter, however, is the type of the data in
260 memory. Additionally, instead of using an "int islong" parameter, the
261 memtype will be either ::NC_INT or ::NC_INT64, depending on the value
262 of sizeof(long). This means that even netcdf-3 code must be prepared
263 to encounter the ::NC_INT64 type.
264 
265 ## Accessing Attributes with put_attr() and get_attr()
266 
267 ````
268  int (*get_att)(int ncid, int varid, const char *name,
269  void *value, nc_type memtype);
270 ````
271 
272 ````
273  int (*put_att)(int ncid, int varid, const char *name, nc_type datatype, size_t len,
274  const void *value, nc_type memtype);
275 ````
276 
277 Again, the key difference is the memtype parameter. As with
278 put/get_vara, it used ::NC_INT64 to encode the long case.
279 
280 ## Pre-defined Dispatch Functions
281 
282 It is sometimes not necessary to implement all the functions in the
283 dispatch table. Some pre-defined functions are available which may be
284 used in many cases.
285 
286 ## Inquiry Functions
287 
288 Many of The netCDF inquiry functions operate from an in-memory model of
289 metadata. Once a file is opened, or a file is created, this
290 in-memory metadata model is kept up to date. Consequenty the inquiry
291 functions do not depend on the dispatch layer code. These functions
292 can be used by all dispatch layers which use the internal netCDF
293 enhanced data model.
294 
295 - NC4_inq
296 - NC4_inq_type
297 - NC4_inq_dimid
298 - NC4_inq_dim
299 - NC4_inq_unlimdim
300 - NC4_inq_att
301 - NC4_inq_attid
302 - NC4_inq_attname
303 - NC4_get_att
304 - NC4_inq_varid
305 - NC4_inq_var_all
306 - NC4_show_metadata
307 - NC4_inq_unlimdims
308 - NC4_inq_ncid
309 - NC4_inq_grps
310 - NC4_inq_grpname
311 - NC4_inq_grpname_full
312 - NC4_inq_grp_parent
313 - NC4_inq_grp_full_ncid
314 - NC4_inq_varids
315 - NC4_inq_dimids
316 - NC4_inq_typeids
317 - NC4_inq_type_equal
318 - NC4_inq_user_type
319 - NC4_inq_typeid
320 
321 ## NCDEFAULT get/put Functions
322 
323 The mapped (varm) get/put functions have been
324 implemented in terms of the array (vara) functions. So dispatch layers
325 need only implement the vara functions, and can use the following
326 functions to get the and varm functions:
327 
328 - NCDEFAULT_get_varm
329 - NCDEFAULT_put_varm
330 
331 For the netcdf-3 format, the strided functions (nc_get/put_vars)
332 are similarly implemented in terms of the vara functions. So the following
333 convenience functions are available.
334 
335 - NCDEFAULT_get_vars
336 - NCDEFAULT_put_vars
337 
338 For the netcdf-4 format, the vars functions actually exist, so
339 the default vars functions are not used.
340 
341 ## Read-Only Functions
342 
343 Some dispatch layers are read-only (ex. HDF4). Any function which
344 writes to a file, including nc_create(), needs to return error code
345 ::NC_EPERM. The following read-only functions are available so that
346 these don't have to be re-implemented in each read-only dispatch layer:
347 
348 - NC_RO_create
349 - NC_RO_redef
350 - NC_RO__enddef
351 - NC_RO_sync
352 - NC_RO_set_fill
353 - NC_RO_def_dim
354 - NC_RO_rename_dim
355 - NC_RO_rename_att
356 - NC_RO_del_att
357 - NC_RO_put_att
358 - NC_RO_def_var
359 - NC_RO_rename_var
360 - NC_RO_put_vara
361 - NC_RO_def_var_fill
362 
363 ## Classic NetCDF Only Functions
364 
365 There are two functions that are only used in the classic code. All
366 other dispatch layers (except PnetCDF) return error ::NC_ENOTNC3 for
367 these functions. The following functions are provided for this
368 purpose:
369 
370 - NOTNC3_inq_base_pe
371 - NOTNC3_set_base_pe
372 
373 # HDF4 Dispatch Layer as a Simple Example
374 
375 The HDF4 dispatch layer is about the simplest possible dispatch
376 layer. It is read-only, classic model. It will serve as a nice, simple
377 example of a dispatch layer.
378 
379 Note that the HDF4 layer is optional in the netCDF build. Not all
380 users will have HDF4 installed, and those users will not build with
381 the HDF4 dispatch layer enabled. For this reason HDF4 code is guarded
382 as follows.
383 ````
384 #ifdef USE_HDF4
385 ...
386 #endif /*USE_HDF4*/
387 ````
388 
389 Code in libhdf4 is only compiled if HDF4 is
390 turned on in the build.
391 
392 ### The netcdf.h File
393 
394 In the main netcdf.h file, we have the following:
395 
396 ````
397 #define NC_FORMATX_NC_HDF4 (3)
398 ````
399 
400 ### The ncdispatch.h File
401 
402 In ncdispatch.h we have the following:
403 
404 ````
405 #ifdef USE_HDF4
406 extern NC_Dispatch* HDF4_dispatch_table;
407 extern int HDF4_initialize(void);
408 extern int HDF4_finalize(void);
409 #endif
410 ````
411 
412 ### The netcdf_meta.h File
413 
414 The netcdf_meta.h file allows for easy determination of what features
415 are in use. For HDF4, It contains the following, set by configure:
416 ````
417 ...
418 #define NC_HAS_HDF4 0 /*!< HDF4 support. */
419 ...
420 ````
421 
422 ### The hdf4dispatch.h File
423 
424 The file *hdf4dispatch.h* contains prototypes and
425 macro definitions used within the HDF4 code in libhdf4. This include
426 file should not be used anywhere except in libhdf4.
427 
428 ### Initialization Code Changes in liblib Directory
429 
430 The file *nc_initialize.c* is modified to include the following:
431 ````
432 #ifdef USE_HDF4
433 extern int HDF4_initialize(void);
434 extern int HDF4_finalize(void);
435 #endif
436 ````
437 
438 ### Changes to libdispatch/dfile.c
439 
440 In order for a dispatch layer to be used, it must be correctly
441 determined in functions *NC_open()* or *NC_create()* in *libdispatch/dfile.c*.
442 HDF4 has a magic number that is detected in
443 *NC_interpret_magic_number()*, which allows *NC_open* to automatically
444 detect an HDF4 file.
445 
446 Once HDF4 is detected, the *model* variable is set to *NC_FORMATX_NC_HDF4*,
447 and later this is used in a case statement:
448 ````
449  case NC_FORMATX_NC_HDF4:
450  dispatcher = HDF4_dispatch_table;
451  break;
452 ````
453 
454 This sets the dispatcher to the HDF4 dispatcher, which is defined in
455 the libhdf4 directory.
456 
457 ### Dispatch Table in libhdf4/hdf4dispatch.c
458 
459 The file *hdf4dispatch.c* contains the definition of the HDF4 dispatch
460 table. It looks like this:
461 ````
462 /* This is the dispatch object that holds pointers to all the
463  * functions that make up the HDF4 dispatch interface. */
464 static NC_Dispatch HDF4_dispatcher = {
465 NC_FORMATX_NC_HDF4,
466 NC_DISPATCH_VERSION,
467 NC_RO_create,
468 NC_HDF4_open,
469 NC_RO_redef,
470 NC_RO__enddef,
471 NC_RO_sync,
472 ...
473 NC_NOTNC4_set_var_chunk_cache,
474 NC_NOTNC4_get_var_chunk_cache,
475 ...
476 };
477 ````
478 Note that most functions use some of the predefined dispatch
479 functions. Functions that start with NC_RO* are read-only, they return
480 ::NC_EPERM. Functions that start with NOTNC4* return ::NC_ENOTNC4.
481 
482 Only the functions that start with NC_HDF4* need to be implemented for
483 the HDF4 dispatch layer. There are 6 such functions:
484 
485 - NC_HDF4_open
486 - NC_HDF4_abort
487 - NC_HDF4_close
488 - NC_HDF4_inq_format
489 - NC_HDF4_inq_format_extended
490 - NC_HDF4_get_vara
491 
492 ### HDF4 Reading Code
493 
494 The code in *hdf4file.c* opens the HDF4 SD dataset, and reads the
495 metadata. This metadata is stored in the netCDF internal metadata
496 model, allowing the inq functions to work.
497 
498 The code in *hdf4var.c* does an *nc_get_vara()* on the HDF4 SD
499 dataset. This is all that is needed for all the nc_get_* functions to
500 work.
501 
502 # Appendix A. Changing NC_DISPATCH_VERSION
503 
504 When new entries are added to the *struct NC_Dispatch* type `located in include/netcdf_dispatch.h.in` it is necessary to do two things.
505 
506 1. Bump the NC_DISPATCH_VERSION number
507 2. Modify the existing dispatch tables to include the new entries.
508 It if often the case that the new entries do not mean anything for
509 a given dispatch table. In that case, the new entries may be set to
510 some variant of *NC_RO_XXX* or *NC_NOTNC4_XXX* *NC_NOTNC3_XXX*.
511 
512 Modifying the dispatch version requires two steps:
513 1. Modify the version number in *netcdf-c/configure.ac*, and
514 2. Modify the version number in *netcdf-c/CMakeLists.txt*.
515 
516 The two should agree in value.
517 
518 ### NC_DISPATCH_VERSION Incompatibility
519 
520 When dynamically adding a dispatch table
521 -- in nc_def_user_format (see libdispatch/dfile.c) --
522 the version of the new table is compared with that of the built-in
523 NC_DISPATCH_VERSION; if they differ, then an error is returned from
524 that function.
525 
526 
527 # Point of Contact {#dispatch_poc}
528 
529 *Author*: Dennis Heimbigner<br>
530 *Email*: dmh at ucar dot edu<br>
531 *Initial Version*: 12/22/2021<br>
532 *Last Revised*: 12/22/2021