NetCDF  4.6.3
 All Data Structures Files Functions Variables Typedefs Macros Modules Pages
bestpractices.md
1 Writing NetCDF Files: Best Practices {#BestPractices}
2 ====================================
3 
4 [TOC]
5 
6 Best Practices {#bp_Best_Practices}
7 =====================================
8 
9 ## Conventions {#bp_Conventions}
10 
11 While netCDF is intended for "self-documenting data", it is often
12 necessary for data writers and readers to agree upon attribute
13 conventions and representations for discipline-specific data structures.
14 These agreements are written up as human readable documents called
15 ***netCDF conventions***.
16 
17 Use an existing Convention if possible. See the list of [registered
18 conventions](/software/netcdf/conventions.html).
19 
20 The CF Conventions are recommended where applicable, especially for
21 gridded (model) datasets.
22 
23 Document the convention you are using by adding the global attribute
24 "Conventions" to each netCDF file, for example:
25 
26 This document refers to conventions for the netCDF *classic* data model.
27 For recommendations about conventions for the netCDF-4 *enhanced* data
28 model, see [Developing Conventions for
29 NetCDF-4](/netcdf/papers/nc4_conventions.html).
30 
31 ## Coordinate Systems {#bp_Coordinate-Systems}
32 
33 A ***coordinate variable*** is a one-dimensional variable with the same
34 name as a dimension, which names the coordinate values of the dimension.
35 It must not have any missing data (for example, no `_FillValue` or
36 `missing_value` attributes) and must be strictly monotonic (values
37 increasing or decreasing). A two-dimensional variable of type char is a
38 ***string-valued coordinate variable*** if it has the same name as its
39 first dimension, e.g.: **char time( time, time\_len);** all of its
40 strings must be unique. A variable's ***coordinate system*** is the set
41 of coordinate variables used by the variable. Coordinates that refer to
42 physical space are called ***spatial coordinates***, ones that refer to
43 physical time are called ***time coordinates***, ones that refer to
44 either physical space or time are called ***spatio-temporal
45 coordinates.***
46 
47 - Make coordinate variables for every dimension possible (except for
48  string length dimensions).
49 - Give each coordinate variable at least `unit` and `long_name`
50  attributes to document its meaning.
51 - Use an existing netCDF [Convention](#Conventions) for your
52  coordinate variables, especially to identify
53  spatio-temporal coordinates.
54 - Use shared dimensions to indicate that two variables use the same
55  coordinates along that dimension. If two variables' dimensions are
56  not related, create separate dimensions for them, even if they
57  happen to have the same length.
58 
59 ## Variable Grouping {#bp_Variable-Grouping}
60 
61 You may structure the data in a netCDF file in different ways, for
62 example putting related parameters into a single variable by adding an
63 extra dimension. Standard visualization and analysis software may have
64 trouble breaking that data out, however. On the other extreme, it is
65 possible to create different variables e.g. for different vertical
66 levels of the same parameter. However, standard visualization and
67 analysis software may have trouble grouping that data back together.
68 Here are some guidelines for deciding how to group your data into
69 variables:
70 
71 - All of the data in a variable must be of the same type and should
72  have the same units of measurement.
73 - A variable's attributes should be applicable to all its data.
74 - If possible, all of the coordinate variables should be
75  spatio-temporal, with no extra dimensions.
76 - Use 4D spatio-temporal coordinate systems in preference to 3D. Use
77  3D spatio-temporal coordinate systems in preference to 2D.
78 - Vector valued (e.g. wind) parameters are legitimate uses of extra
79  dimensions. There are trade-offs between putting vectors in the same
80  variables vs. putting each component of a vector in a
81  different variable. Check that any visualization software you plan
82  to use can deal with the structure you choose.
83 - Think in terms of complete coordinate systems (especially
84  spatio-temporal), and organize your data into variables accordingly.
85  Variables with the same coordinate system implicitly form a group.
86 
87 ## Variable Attributes {#bp_Variable-Attributes}
88 
89 
90 - For each variable where it makes sense, add a **units** attribute,
91  using the [udunits](/software/udunits/index.html) conventions,
92  if possible.
93 - For each variable where it makes sense, add a **long\_name ****
94  attribute, which is a human-readable descriptive name for
95  the variable. This could be used for labeling plots, for example.
96 
97 ## Strings and Variables of type char {#bp_Strings-and-Variables-of-type-char}
98 
99 NetCDF-3 does not have a primitive **String** type, but does have arrays
100 of type **char**, which are 8 bits in size. The main difference is that
101 Strings are variable length arrays of chars, while char arrays are fixed
102 length. Software written in C usually depends on Strings being zero
103 terminated, while software in Fortran and Java do not. Both C
104 (*nc\_get\_vara\_text*) and Java (*ArrayChar.getString*) libraries have
105 convenience routines that read char arrays and convert to Strings.
106 
107 - Do not use char type variables for numeric data, use byte type
108  variables instead.
109 - Consider using a global Attribute instead of a Variable to store a
110  String applicable to the whole dataset.
111 - When you want to store arrays of Strings, use a multidimensional
112  char array. All of the Strings will be the same length.
113 - There are 3 strategies for writing variable length Strings and
114  zero-byte termination:
115  1. *Fortran convention*: pad with blanks and never terminate with a
116  zero byte.
117  2. *C convention*: pad with zeros and always terminate with a
118  zero byte.
119  3. *Java convention*: You don't need to store a trailing zero byte,
120  but pad trailing unused characters with zero bytes.
121 - When reading, trim zeros and blanks from the end of the char array
122  and if in C, add a zero byte terminator.
123 
124 ## Calendar Date/Time {#bp_Calendar-Date-Time}
125 
126 Time as a fundamental unit means a time interval, measured in seconds. A
127 Calendar date/time is a specific instance in real, physical time. Dates
128 are specified as an interval from some ***reference time*** e.g. "days
129 elapsed since Greenwich mean noon on 1 January 4713 BCE". The reference
130 time implies a system of counting time called a ***calendar*** (e.g.
131 Gregorian calendar) and a textual representation (e.g. [ISO
132 8601](http://www.cl.cam.ac.uk/%7Emgk25/iso-time.html)).
133 
134 There are two strategies for storing a date/time into a netCDF variable.
135 One is to encode it as a numeric value and a unit that includes the
136 reference time, e.g. "seconds since 2001-1-1 0:0:0" or"days since
137 2001-1-1 0:0:0" . The other is to store it as a String using a standard
138 encoding and Calendar. The former is more compact if you have more than
139 one date, and makes it easier to compute intervals between two dates.
140 
141 Unidata's [udunits](/software/udunits/) package provides a convenient
142 way to implement the first strategy. It uses the ISO 8601 encoding and a
143 hybrid Gregorian/Julian calendar, but udunits does not support use of
144 other Calendars or encodings for the reference time. However the ncdump
145 "-T" option can display numeric times that use udunits (and optionally
146 climate calendars) as ISO 8601 strings that are easy for humans to
147 interpret.
148 
149 - If your data uses real, physical time that is well represented using
150  the Gregorian/Julian calendar, encode it as an interval from a
151  reference time, and add a units attribute which uses a
152  udunits-compatible time unit. If the data assumes one of the
153  non-standard calendars mentioned in the CF Conventions, specify that
154  with a Calendar attribute. Readers can then use the udunits package
155  to manipulate or format the date values, and the ncdump utility can
156  display them with either numeric or string representation.
157 - If your data uses a calendar not supported by the CF Conventions,
158  make it compatible with existing date manipulation packages if
159  possible (for example, java.text.SimpleDateFormat).
160 - Add multiple sets of time encodings if necessary to allow different
161  readers to work as well as possible.\
162 
163 ## Unsigned Data {#bp_Unsigned-Data}
164 
165 NetCDF-3 does not have unsigned integer primitive types.
166 
167 - To be completely safe with unknown readers, widen the data type, or
168  use floating point.
169 - You can use the corresponding signed types to store unsigned data
170  only if all client programs know how to interpret this correctly.
171 - A new proposed convention is to create a variable attribute
172  `_Unsigned = "true"` to indicate that integer data should be treated
173  as unsigned.
174 
175 ## Packed Data Values {#bp_Packed-Data-Values}
176 
177 Packed data is stored in a netCDF file by limiting precision and using a
178 smaller data type than the original data, for example, packing
179 double-precision (64-bit) values into short (16-bit) integers. The
180 C-based netCDF libraries do not do the packing and unpacking. (The
181 [netCDF Java library](/software/netcdf-java/) will do automatic
182 unpacking when the
183 [VariableEnhanced](/software/netcdf-java/v4.1/javadocAll/ucar/nc2/dataset/VariableEnhanced.html)
184 Interface is used. For details see
185 [EnhancedScaleMissing](/software/netcdf-java/v4.1/javadocAll/ucar/nc2/dataset/EnhanceScaleMissing.html)).
186 
187 - Each variable with packed data has two attributes called
188  **scale\_factor** and **add\_offset**, so that the packed data may
189  be read and unpacked using the formula:
190 
191  > ***unpacked\_data\_value = packed\_data\_value \* scale\_factor +
192  > add\_offset***
193 
194 - The type of the stored variable is the packed data type, typically
195  byte, short or int.
196 - The type of the scale\_factor and add\_offset attributes should be
197  the type that you want the unpacked data to be, typically float
198  or double.
199 - To avoid introducing a bias into the unpacked values due to
200  truncation when packing, the data provider should round to the
201  nearest integer rather than just truncating towards zero before
202  writing the data:
203 
204  > ***packed\_data\_value = nint((unpacked\_data\_value -
205  > add\_offset) / scale\_factor)***
206 
207 Depending on whether the packed data values are intended to be
208 interpreted by the reader as signed or unsigned integers, there are
209 alternative ways for the data provider to compute the *scale\_factor*
210 and *add\_offset* attributes. In either case, the formulas above apply
211 for unpacking and packing the data.
212 
213 A conventional way to indicate whether a byte, short, or int variable is
214 meant to be interpreted as unsigned, even for the netCDF-3 classic model
215 that has no external unsigned integer type, is by providing the special
216 variable attribute `_Unsigned` with value `"true"`. However, most
217 existing data for which packed values are intended to be interpreted as
218 unsigned are stored without this attribute, so readers must be aware of
219 packing assumptions in this case. In the enhanced netCDF-4 data model,
220 packed integers may be declared to be of the appropriate unsigned type.
221 
222 Let *n* be the number of bits in the packed type, and assume *dataMin*
223 and *dataMax* are the minimum and maximum values that will be used for a
224 variable to be packed.
225 
226 - If the packed values are intended to be interpreted as signed
227  integers (the default assumption for classic model data), you may
228  use:
229 
230  > *scale\_factor =(dataMax - dataMin) / (2^n^ - 1)*
231 
232  > *add\_offset = dataMin + 2^n\\ -\\ 1^ \* scale\_factor*
233 
234 - If the packed values are intended to be interpreted as unsigned (for
235  example, when read in the C interface using the `nc_get_var_uchar()`
236  function), use:
237 
238  > *scale\_factor =(dataMax - dataMin) / (2^n^ - 1)*
239 
240  > *add\_offset = dataMin*
241 
242 - In either the signed or unsigned case, an alternate formula may be
243  used for the add\_offset and scale\_factor packing parameters that
244  reserves a packed value for a special value, such as an indicator of
245  missing data. For example, to reserve the minimum packed value
246  (-2^n\\ -\\ 1^) for use as a special value in the case of signed
247  packed values:
248 
249  > *scale\_factor =(dataMax - dataMin) / (2^n^ - 2)*
250 
251  > *add\_offset = (dataMax + dataMin) / 2*
252 
253 - If the packed values are unsigned, then the analogous formula that
254  reserves 0 as the packed form of a special value would be:
255 
256  > *scale\_factor =(dataMax - dataMin) / (2^n^ - 2)*
257 
258  > *add\_offset = dataMin - scale\_factor*
259 
260 - Example, packing 32-bit floats into 16-bit shorts:
261 
262  variables:
263  short data( z, y, x);
264  data:scale_offset = 34.02f;
265  data:add_offset = 1.54f;
266 
267 - The `units` attribute applies to unpacked values.
268 
269 ## Missing Data Values {#bp_Missing-Data-Values}
270 
271 ***Missing data*** is a general name for data values that are invalid,
272 never written, or missing. The netCDF library itself does not handle
273 these values in any special way, except that the value of a `_FillValue`
274 attribute, if any, is used in pre-filling unwritten data. (The
275 Java-netCDF library will assist in recognizing these values when
276 reading, see class **VariableStandardized**).
277 
278 - Default fill values for each type are available in the C-based
279  interfaces, and are defined in the appropriate header files. For
280  example, in the C interface, NC\_FILL\_FLOAT and NC\_FILL\_DOUBLE
281  are numbers near 9.9692e+36 that are returned when you try to read
282  values that were never written. Writing, reading, and testing for
283  equality with these default fill values works portably on the
284  platforms on which netCDF has been tested.
285 - The `_FillValue` attribute should have the same data type as the
286  variable it describes. If the variable is packed using
287  `scale_factor` and `add_offset` attributes, the `_FillValue`
288  attribute should have the data type of the packed data.
289 - Another way of indicating missing values for real type data is to
290  store an IEEE **NaN** floating point value. The advantage of this is
291  that any computation using a NaN results in a NaN. Client software
292  must know to look for NaNs, however, and detection of NaNs is
293  tricky, since any comparison with a NaN is required to return
294  *false*.
295  - In Java, you can use **Double.NaN** and **Float.NaN** constants.
296  - In many C compilers, you can generate a NaN value using **double
297  nan = 0.0 / 0.0;**
298 - Alternatively or in addition, set the **valid\_range** attribute for
299  each variable that uses missing values, and make sure all valid data
300  is within that range, and all missing or invalid data is outside of
301  that range. Again, the client software must recognize and make use
302  of this information. Example:
303 
304  variables:
305  float data( z, y, x);
306  data:valid_range = -999.0f, 999.0f;
307 
308 
309  If the variable is packed using `scale_factor` and `add_offset`
310  attributes, the `valid_range` attribute should have the data type of
311  the packed data.
312 
313  If the variable is unsigned the `valid_range` values should be
314  widened if needed and stored as unsigned integers.
315 
316 ## Miscellaneous tips {#bp_Miscellaneous-tips}
317 
318 - To define a file whose structure is known in advance, write a CDL
319  file and create the netCDF file using
320  [ncgen](/cgi-bin/man-cgi?ncgen). Then write the data into the netCDF
321  file using your program. This is typically much easier than
322  programming all of the create calls yourself.
323 - For the netCDF classic or 64-bit-offset formats, it's possible to
324  reserve extra space in the file when it is created so that you can
325  later add additional attributes or non-record variables without
326  copying all the data. (This is not necessary for netCDF-4 files,
327  because metadata can be added efficiently to such files.) See the [C
328  man-page reference documentation](/cgi-bin/man-cgi?netcdf+-s3) (or
329  the [Fortran reference documentation](/cgi-bin/man-cgi?netcdf+-s3f))
330  for `nc__create` and `nc__enddef` (`nf__create` and `nf__enddef`
331  for Fortran) for more details on reserving extra space in
332  the header.
333 
334 ## Spelling netCDF: Best Practices {#bp_Spelling-netCDF-Best-Practices}
335 
336 There are only 3 correct spellings of "netCDF":
337 
338 1. **netCDF:** The original spelling of the name of the data model,
339  API, and format. The acronym stands for network Common Data Form
340  (not Format), and the "CDF" part was capitalized in part to pay
341  homage to the NASA "CDF" data model which the netCDF data
342  model extended.
343 2. **netcdf:** Used in certain file names, such as:
344 
345  #include <netcdf.h>
346 
347 3. **NetCDF**: Used in titles and at the beginning of sentences, where
348  "netCDF" is awkward or violates style guidelines.
349 
350 All other forms, and most especially "Netcdf", are considered vulgar and
351 a sign of ill-breeding or misspent youth, analogous to the egregious but
352 common misspelling "JAVA" used by those who are new to the language or
353 who mistakenly think it is an acronym.\

Return to the Main Unidata NetCDF page.
Generated on Sat Apr 6 2019 08:19:00 for NetCDF. NetCDF is a Unidata library.