1 Writing NetCDF Files: Best Practices {#BestPractices}
2 ====================================
6 Best Practices {#bp_Best_Practices}
7 =====================================
9 ## Conventions {#bp_Conventions}
11 While netCDF is intended for "self-documenting data", it is often
12 necessary for data writers and readers to agree upon attribute
13 conventions and representations for discipline-specific data structures.
14 These agreements are written up as human readable documents called
15 ***netCDF conventions***.
17 Use an existing Convention if possible. See the list of [registered
18 conventions](/software/netcdf/conventions.html).
20 The CF Conventions are recommended where applicable, especially for
21 gridded (model) datasets.
23 Document the convention you are using by adding the global attribute
24 "Conventions" to each netCDF file, for example:
26 This document refers to conventions for the netCDF *classic* data model.
27 For recommendations about conventions for the netCDF-4 *enhanced* data
28 model, see [Developing Conventions for
29 NetCDF-4](/netcdf/papers/nc4_conventions.html).
31 ## Coordinate Systems {#bp_Coordinate-Systems}
33 A ***coordinate variable*** is a one-dimensional variable with the same
34 name as a dimension, which names the coordinate values of the dimension.
35 It must not have any missing data (for example, no `_FillValue` or
36 `missing_value` attributes) and must be strictly monotonic (values
37 increasing or decreasing). A two-dimensional variable of type char is a
38 ***string-valued coordinate variable*** if it has the same name as its
39 first dimension, e.g.: **char time( time, time\_len);** all of its
40 strings must be unique. A variable's ***coordinate system*** is the set
41 of coordinate variables used by the variable. Coordinates that refer to
42 physical space are called ***spatial coordinates***, ones that refer to
43 physical time are called ***time coordinates***, ones that refer to
44 either physical space or time are called ***spatio-temporal
47 - Make coordinate variables for every dimension possible (except for
48 string length dimensions).
49 - Give each coordinate variable at least `unit` and `long_name`
50 attributes to document its meaning.
51 - Use an existing netCDF [Convention](#Conventions) for your
52 coordinate variables, especially to identify
53 spatio-temporal coordinates.
54 - Use shared dimensions to indicate that two variables use the same
55 coordinates along that dimension. If two variables' dimensions are
56 not related, create separate dimensions for them, even if they
57 happen to have the same length.
59 ## Variable Grouping {#bp_Variable-Grouping}
61 You may structure the data in a netCDF file in different ways, for
62 example putting related parameters into a single variable by adding an
63 extra dimension. Standard visualization and analysis software may have
64 trouble breaking that data out, however. On the other extreme, it is
65 possible to create different variables e.g. for different vertical
66 levels of the same parameter. However, standard visualization and
67 analysis software may have trouble grouping that data back together.
68 Here are some guidelines for deciding how to group your data into
71 - All of the data in a variable must be of the same type and should
72 have the same units of measurement.
73 - A variable's attributes should be applicable to all its data.
74 - If possible, all of the coordinate variables should be
75 spatio-temporal, with no extra dimensions.
76 - Use 4D spatio-temporal coordinate systems in preference to 3D. Use
77 3D spatio-temporal coordinate systems in preference to 2D.
78 - Vector valued (e.g. wind) parameters are legitimate uses of extra
79 dimensions. There are trade-offs between putting vectors in the same
80 variables vs. putting each component of a vector in a
81 different variable. Check that any visualization software you plan
82 to use can deal with the structure you choose.
83 - Think in terms of complete coordinate systems (especially
84 spatio-temporal), and organize your data into variables accordingly.
85 Variables with the same coordinate system implicitly form a group.
87 ## Variable Attributes {#bp_Variable-Attributes}
90 - For each variable where it makes sense, add a **units** attribute,
91 using the [udunits](/software/udunits/index.html) conventions,
93 - For each variable where it makes sense, add a **long\_name ****
94 attribute, which is a human-readable descriptive name for
95 the variable. This could be used for labeling plots, for example.
97 ## Strings and Variables of type char {#bp_Strings-and-Variables-of-type-char}
99 NetCDF-3 does not have a primitive **String** type, but does have arrays
100 of type **char**, which are 8 bits in size. The main difference is that
101 Strings are variable length arrays of chars, while char arrays are fixed
102 length. Software written in C usually depends on Strings being zero
103 terminated, while software in Fortran and Java do not. Both C
104 (*nc\_get\_vara\_text*) and Java (*ArrayChar.getString*) libraries have
105 convenience routines that read char arrays and convert to Strings.
107 - Do not use char type variables for numeric data, use byte type
109 - Consider using a global Attribute instead of a Variable to store a
110 String applicable to the whole dataset.
111 - When you want to store arrays of Strings, use a multidimensional
112 char array. All of the Strings will be the same length.
113 - There are 3 strategies for writing variable length Strings and
114 zero-byte termination:
115 1. *Fortran convention*: pad with blanks and never terminate with a
117 2. *C convention*: pad with zeros and always terminate with a
119 3. *Java convention*: You don't need to store a trailing zero byte,
120 but pad trailing unused characters with zero bytes.
121 - When reading, trim zeros and blanks from the end of the char array
122 and if in C, add a zero byte terminator.
124 ## Calendar Date/Time {#bp_Calendar-Date-Time}
126 Time as a fundamental unit means a time interval, measured in seconds. A
127 Calendar date/time is a specific instance in real, physical time. Dates
128 are specified as an interval from some ***reference time*** e.g. "days
129 elapsed since Greenwich mean noon on 1 January 4713 BCE". The reference
130 time implies a system of counting time called a ***calendar*** (e.g.
131 Gregorian calendar) and a textual representation (e.g. [ISO
132 8601](http://www.cl.cam.ac.uk/%7Emgk25/iso-time.html)).
134 There are two strategies for storing a date/time into a netCDF variable.
135 One is to encode it as a numeric value and a unit that includes the
136 reference time, e.g. "seconds since 2001-1-1 0:0:0" or"days since
137 2001-1-1 0:0:0" . The other is to store it as a String using a standard
138 encoding and Calendar. The former is more compact if you have more than
139 one date, and makes it easier to compute intervals between two dates.
141 Unidata's [udunits](/software/udunits/) package provides a convenient
142 way to implement the first strategy. It uses the ISO 8601 encoding and a
143 hybrid Gregorian/Julian calendar, but udunits does not support use of
144 other Calendars or encodings for the reference time. However the ncdump
145 "-T" option can display numeric times that use udunits (and optionally
146 climate calendars) as ISO 8601 strings that are easy for humans to
149 - If your data uses real, physical time that is well represented using
150 the Gregorian/Julian calendar, encode it as an interval from a
151 reference time, and add a units attribute which uses a
152 udunits-compatible time unit. If the data assumes one of the
153 non-standard calendars mentioned in the CF Conventions, specify that
154 with a Calendar attribute. Readers can then use the udunits package
155 to manipulate or format the date values, and the ncdump utility can
156 display them with either numeric or string representation.
157 - If your data uses a calendar not supported by the CF Conventions,
158 make it compatible with existing date manipulation packages if
159 possible (for example, java.text.SimpleDateFormat).
160 - Add multiple sets of time encodings if necessary to allow different
161 readers to work as well as possible.\
163 ## Unsigned Data {#bp_Unsigned-Data}
165 NetCDF-3 does not have unsigned integer primitive types.
167 - To be completely safe with unknown readers, widen the data type, or
169 - You can use the corresponding signed types to store unsigned data
170 only if all client programs know how to interpret this correctly.
171 - A new proposed convention is to create a variable attribute
172 `_Unsigned = "true"` to indicate that integer data should be treated
175 ## Packed Data Values {#bp_Packed-Data-Values}
177 Packed data is stored in a netCDF file by limiting precision and using a
178 smaller data type than the original data, for example, packing
179 double-precision (64-bit) values into short (16-bit) integers. The
180 C-based netCDF libraries do not do the packing and unpacking. (The
181 [netCDF Java library](/software/netcdf-java/) will do automatic
183 [VariableEnhanced](/software/netcdf-java/v4.1/javadocAll/ucar/nc2/dataset/VariableEnhanced.html)
184 Interface is used. For details see
185 [EnhancedScaleMissing](/software/netcdf-java/v4.1/javadocAll/ucar/nc2/dataset/EnhanceScaleMissing.html)).
187 - Each variable with packed data has two attributes called
188 **scale\_factor** and **add\_offset**, so that the packed data may
189 be read and unpacked using the formula:
191 > ***unpacked\_data\_value = packed\_data\_value \* scale\_factor +
194 - The type of the stored variable is the packed data type, typically
196 - The type of the scale\_factor and add\_offset attributes should be
197 the type that you want the unpacked data to be, typically float
199 - To avoid introducing a bias into the unpacked values due to
200 truncation when packing, the data provider should round to the
201 nearest integer rather than just truncating towards zero before
204 > ***packed\_data\_value = nint((unpacked\_data\_value -
205 > add\_offset) / scale\_factor)***
207 Depending on whether the packed data values are intended to be
208 interpreted by the reader as signed or unsigned integers, there are
209 alternative ways for the data provider to compute the *scale\_factor*
210 and *add\_offset* attributes. In either case, the formulas above apply
211 for unpacking and packing the data.
213 A conventional way to indicate whether a byte, short, or int variable is
214 meant to be interpreted as unsigned, even for the netCDF-3 classic model
215 that has no external unsigned integer type, is by providing the special
216 variable attribute `_Unsigned` with value `"true"`. However, most
217 existing data for which packed values are intended to be interpreted as
218 unsigned are stored without this attribute, so readers must be aware of
219 packing assumptions in this case. In the enhanced netCDF-4 data model,
220 packed integers may be declared to be of the appropriate unsigned type.
222 Let *n* be the number of bits in the packed type, and assume *dataMin*
223 and *dataMax* are the minimum and maximum values that will be used for a
224 variable to be packed.
226 - If the packed values are intended to be interpreted as signed
227 integers (the default assumption for classic model data), you may
230 > *scale\_factor =(dataMax - dataMin) / (2^n^ - 1)*
232 > *add\_offset = dataMin + 2^n\\ -\\ 1^ \* scale\_factor*
234 - If the packed values are intended to be interpreted as unsigned (for
235 example, when read in the C interface using the `nc_get_var_uchar()`
238 > *scale\_factor =(dataMax - dataMin) / (2^n^ - 1)*
240 > *add\_offset = dataMin*
242 - In either the signed or unsigned case, an alternate formula may be
243 used for the add\_offset and scale\_factor packing parameters that
244 reserves a packed value for a special value, such as an indicator of
245 missing data. For example, to reserve the minimum packed value
246 (-2^n\\ -\\ 1^) for use as a special value in the case of signed
249 > *scale\_factor =(dataMax - dataMin) / (2^n^ - 2)*
251 > *add\_offset = (dataMax + dataMin) / 2*
253 - If the packed values are unsigned, then the analogous formula that
254 reserves 0 as the packed form of a special value would be:
256 > *scale\_factor =(dataMax - dataMin) / (2^n^ - 2)*
258 > *add\_offset = dataMin - scale\_factor*
260 - Example, packing 32-bit floats into 16-bit shorts:
263 short data( z, y, x);
264 data:scale_offset = 34.02f;
265 data:add_offset = 1.54f;
267 - The `units` attribute applies to unpacked values.
269 ## Missing Data Values {#bp_Missing-Data-Values}
271 ***Missing data*** is a general name for data values that are invalid,
272 never written, or missing. The netCDF library itself does not handle
273 these values in any special way, except that the value of a `_FillValue`
274 attribute, if any, is used in pre-filling unwritten data. (The
275 Java-netCDF library will assist in recognizing these values when
276 reading, see class **VariableStandardized**).
278 - Default fill values for each type are available in the C-based
279 interfaces, and are defined in the appropriate header files. For
280 example, in the C interface, NC\_FILL\_FLOAT and NC\_FILL\_DOUBLE
281 are numbers near 9.9692e+36 that are returned when you try to read
282 values that were never written. Writing, reading, and testing for
283 equality with these default fill values works portably on the
284 platforms on which netCDF has been tested.
285 - The `_FillValue` attribute should have the same data type as the
286 variable it describes. If the variable is packed using
287 `scale_factor` and `add_offset` attributes, the `_FillValue`
288 attribute should have the data type of the packed data.
289 - Another way of indicating missing values for real type data is to
290 store an IEEE **NaN** floating point value. The advantage of this is
291 that any computation using a NaN results in a NaN. Client software
292 must know to look for NaNs, however, and detection of NaNs is
293 tricky, since any comparison with a NaN is required to return
295 - In Java, you can use **Double.NaN** and **Float.NaN** constants.
296 - In many C compilers, you can generate a NaN value using **double
298 - Alternatively or in addition, set the **valid\_range** attribute for
299 each variable that uses missing values, and make sure all valid data
300 is within that range, and all missing or invalid data is outside of
301 that range. Again, the client software must recognize and make use
302 of this information. Example:
305 float data( z, y, x);
306 data:valid_range = -999.0f, 999.0f;
309 If the variable is packed using `scale_factor` and `add_offset`
310 attributes, the `valid_range` attribute should have the data type of
313 If the variable is unsigned the `valid_range` values should be
314 widened if needed and stored as unsigned integers.
316 ## Miscellaneous tips {#bp_Miscellaneous-tips}
318 - To define a file whose structure is known in advance, write a CDL
319 file and create the netCDF file using
320 [ncgen](/cgi-bin/man-cgi?ncgen). Then write the data into the netCDF
321 file using your program. This is typically much easier than
322 programming all of the create calls yourself.
323 - For the netCDF classic or 64-bit-offset formats, it's possible to
324 reserve extra space in the file when it is created so that you can
325 later add additional attributes or non-record variables without
326 copying all the data. (This is not necessary for netCDF-4 files,
327 because metadata can be added efficiently to such files.) See the [C
328 man-page reference documentation](/cgi-bin/man-cgi?netcdf+-s3) (or
329 the [Fortran reference documentation](/cgi-bin/man-cgi?netcdf+-s3f))
330 for `nc__create` and `nc__enddef` (`nf__create` and `nf__enddef`
331 for Fortran) for more details on reserving extra space in
334 ## Spelling netCDF: Best Practices {#bp_Spelling-netCDF-Best-Practices}
336 There are only 3 correct spellings of "netCDF":
338 1. **netCDF:** The original spelling of the name of the data model,
339 API, and format. The acronym stands for network Common Data Form
340 (not Format), and the "CDF" part was capitalized in part to pay
341 homage to the NASA "CDF" data model which the netCDF data
343 2. **netcdf:** Used in certain file names, such as:
347 3. **NetCDF**: Used in titles and at the beginning of sentences, where
348 "netCDF" is awkward or violates style guidelines.
350 All other forms, and most especially "Netcdf", are considered vulgar and
351 a sign of ill-breeding or misspent youth, analogous to the egregious but
352 common misspelling "JAVA" used by those who are new to the language or
353 who mistakenly think it is an acronym.\