NetCDF
4.6.3
|
Below is an example of CDL, describing a netCDF classic format file with several named dimensions (lat, lon, time), variables (z, t, p, rh, lat, lon, time), variable attributes (units, _FillValue, valid_range), and some data.
All CDL statements are terminated by a semicolon. Spaces, tabs, and newlines can be used freely for readability. Comments may follow the double slash characters '//' on any line.
A CDL description for a classic model file consists of three optional parts: dimensions, variables, and data. The variable part may contain variable declarations and attribute assignments. For the enhanced model supported by netCDF-4, a CDL description may also include groups, subgroups, and user-defined types.
A dimension is used to define the shape of one or more of the multidimensional variables described by the CDL description. A dimension has a name and a length. At most one dimension in a classic CDL description can have the unlimited length, which means a variable using this dimension can grow to any length (like a record number in a file). Any number of dimensions can be declared of unlimited length in CDL for an enhanced model file.
A variable represents a multidimensional array of values of the same type. A variable has a name, a data type, and a shape described by its list of dimensions. Each variable may also have associated attributes (see below) as well as data values. The name, data type, and shape of a variable are specified by its declaration in the variable section of a CDL description. A variable may have the same name as a dimension; by convention such a variable contains coordinates of the dimension it names.
An attribute contains information about a variable or about the whole netCDF dataset or containing group. Attributes may be used to specify such properties as units, special values, maximum and minimum valid values, and packing parameters. Attribute information is represented by single values or one-dimensional arrays of values. For example, “units” might be an attribute represented by a string such as “celsius”. An attribute has an associated variable, a name, a data type, a length, and a value. In contrast to variables that are intended for data, attributes are intended for ancillary data or metadata (data about data).
In CDL, an attribute is designated by a variable and attribute name, separated by a colon (':'). It is possible to assign global attributes to the netCDF dataset as a whole by omitting the variable name and beginning the attribute name with a colon (':'). The data type of an attribute in CDL, if not explicitly specified, is derived from the type of the value assigned to it. In the netCDF-4 enhanced model, attributes may be declared to be of user-defined type, like variables.
The length of an attribute is the number of data values assigned to it. Multiple values are assigned to non-character attributes by separating the values with commas (','). All values assigned to an attribute must be of the same type. In the classic data model, character arrays are used for textual information. The length of a character attribute is the number of bytes, and an array of character values can be represented in string notation. In the enhanced data model of netCDF-4, variable-length strings are available as a primitive type, and the length of a string attribute is the number of string values assigned to it.
In CDL, just as for netCDF, the names of dimensions, variables and attributes (and, in netCDF-4 files, groups, user-defined types, compound member names, and enumeration symbols) consist of arbitrary sequences of alphanumeric characters, underscore '_', period '.', plus '+', hyphen '-', or at sign '@', but beginning with a letter or underscore. However names commencing with underscore are reserved for system use. Case is significant in netCDF names. A zero-length name is not allowed. Some widely used conventions restrict names to only alphanumeric characters or underscores. Names that have trailing space characters are also not permitted.
Beginning with versions 3.6.3 and 4.0, names may also include UTF-8 encoded Unicode characters as well as other special characters, except for the character '/', which may not appear in a name (because it is reserved for path names of nested groups). In CDL, most special characters are escaped with a backslash '\' character, but that character is not actually part of the netCDF name. The special characters that do not need to be escaped in CDL names are underscore '_', period '.', plus '+', hyphen '-', or at sign '@'. The formal specification of CDL name syntax is provided in the classic format specification (see NetCDF Classic Format (CDF-1)). Note that by using special characters in names, you may make your data not compliant with conventions that have more stringent requirements on valid names for netCDF components, for example the CF Conventions.
The names for the primitive data types are reserved words in CDL, so names of variables, dimensions, and attributes must not be primitive type names.
The optional data section of a CDL description is where netCDF variables may be initialized. The syntax of an initialization is simple:
The comma-delimited list of constants may be separated by spaces, tabs, and newlines. For multidimensional arrays, the last dimension varies fastest. Thus, row-order rather than column order is used for matrices. If fewer values are supplied than are needed to fill a variable, it is extended with the fill value. The types of constants need not match the type declared for a variable; coercions are done to convert integers to floating point, for example. All meaningful type conversions among numeric primitive types are supported.
A special notation for fill values is supported: the ‘_’ character designates a fill value for variables.
The CDL primitive data types for the classic model are:
NetCDF-4 supports the additional primitive types:
Except for the added numeric data-types byte and ubyte, CDL supports the same numeric primitive data types as C. For backward compatibility, in declarations primitive type names may be specified in either upper or lower case.
The byte type differs from the char type in that it is intended for numeric data, and the zero byte has no special significance, as it may for character data. In the classic data model, byte data could be interpreted as either signed (-128 to 127) or unsigned (0 to 255). When reading byte data in a way that converts it into another numeric type, the default interpretation is signed. The netCDF-4 enhanced data model added an unsigned byte type.
The short type holds values between -32768 and
The float type can hold values between about -3.4+38 and 3.4+38, with external representation as 32-bit IEEE normalized single-precision floating-point numbers. The double type can hold values between about -1.7+308 and 1.7+308, with external representation as 64-bit IEEE standard normalized double-precision, floating-point numbers. The string type holds variable length strings.
A netCDF-4 string is a variable length array of Unicode http://unicode.org/ characters. When reading/writing a String to a netCDF file or other external representation, the characters are UTF-8 encoded http://en.wikipedia.org/wiki/UTF-8 (note that ASCII is a subset of UTF-8). Libraries may use different internal representations, for example the Java library uses UTF-16 encoding. Note especially that Microsoft Windows does not support UTF-8 encoding, only ASCII and UTF-16. So using netcdf on Windows may cause some problems with respect to objects like file paths.
The netCDF char type contains uninterpreted characters, one character per byte. Typically these contain 7-bit ASCII characters, but the character encoding is application specific. For this reason, applications writing data using the enhanced data model are encouraged to use the netCDF-4 string data type in preference to the char data type. Applications writing string data using the char data type are encouraged to add the special variable attribute "_Encoding" with a value that the netCDF libraries recognize. Currently those valid values are "UTF-8" or "ASCII", case insensitive.
This section describes the CDL notation for constants.
Attributes are initialized in the variables section of a CDL description by providing a list of constants that determines the attribute's length and type (if primitive and not explicitly declared). CDL defines a syntax for constant values that permits distinguishing among different netCDF primitive types. The syntax for CDL constants is similar to C syntax, with type suffixes appended to bytes, shorts, and floats to distinguish them from ints and doubles.
A byte constant is represented by an integer constant with a 'b' (or 'B') appended. In the old netCDF-2 API, byte constants could also be represented using single characters or standard C character escape sequences such as 'a' or '
'. This is still supported for backward compatibility, but deprecated to make the distinction clear between the numeric byte type and the textual char type. Example byte constants include:
Character constants are enclosed in double quotes. A character array may be represented as a string enclosed in double quotes. Multiple CDL strings are concatenated into a single array of characters, permitting long character arrays to appear on multiple lines. To support multiple variable-length textual values, a conventional delimiter such as ',' or blank may be used, but interpretation of any such convention for a delimiter must be implemented in software above the netCDF library layer. The usual escape conventions for C strings are honored. For example:
The form of a short constant is an integer constant with an 's' or 'S' appended. If a short constant begins with '0', it is interpreted as octal. When it begins with '0x', it is interpreted as a hexadecimal constant. For example:
The form of an int constant is an ordinary integer constant. If an int constant begins with '0', it is interpreted as octal. When it begins with '0x', it is interpreted as a hexadecimal constant. Examples of valid int constants include:
The float type is appropriate for representing data with about seven significant digits of precision. The form of a float constant is the same as a C floating-point constant with an 'f' or 'F' appended. A decimal point is required in a CDL float to distinguish it from an integer. For example, the following are all acceptable float constants:
The double type is appropriate for representing floating-point data with about 16 significant digits of precision. The form of a double constant is the same as a C floating-point constant. An optional 'd' or 'D' may be appended. A decimal point is required in a CDL double to distinguish it from an integer. For example, the following are all acceptable double constants:
Unsigned integer constants can be created by appending the character 'U' or 'u' between the constant and any trailing size specifier. Thus one could say 10U, 100us, 100000ul, or 1000000ull, for example.
Constants for the variable-length string type, available as a primitive type in the netCDF-4 enhanced data model are, like character constants, represented using double quotes. This represents a potential ambiguity since a multi-character string may also indicate a dimensioned character value. Disambiguation usually occurs by context, but care should be taken to specify the string type to ensure the proper choice. For example, these two CDL specifications of global attributes have different types:
Opaque constants are represented as sequences of hexadecimal digits preceded by 0X or 0x: 0xaa34ffff, for example. These constants can still be used as integer constants and will be either truncated or extended as necessary.
The ncgen man-page reference has more details about CDL representation of constants of user-defined types.
Convert NetCDF file to text form (CDL)
The ncdump utility generates a text representation of a specified netCDF file on standard output, optionally excluding some or all of the variable data in the output. The text representation is in a form called CDL (network Common Data form Language) that can be viewed, edited, or serve as input to ncgen, a companion program that can generate a binary netCDF file from a CDL file. Hence ncgen and ncdump can be used as inverses to transform the data representation between binary and text representations. See ncgen documentation for a description of CDL and netCDF representations.
ncdump may also be used to determine what kind of netCDF file is used (which variant of the netCDF file format) with the -k option.
If DAP support was enabled when ncdump was built, the file name may specify a DAP URL. This allows ncdump to access data sources from DAP servers, including data in other formats than netCDF. When used with DAP URLs, ncdump shows the translation from the DAP data model to the netCDF data model.
ncdump may also be used as a simple browser for netCDF data files, to display the dimension names and lengths; variable names, types, and shapes; attribute names and values; and optionally, the values of data for all variables or selected variables in a netCDF file. For netCDF-4 files, groups and user-defined types are also included in ncdump output.
ncdump uses '_' to represent data values that are equal to the '_FillValue' attribute for a variable, intended to represent data that has not yet been written. If a variable has no '_FillValue' attribute, the default fill value for the variable type is used unless the variable is of byte type.
ncdump defines a default display format used for each type of netCDF data, but this can be changed if a 'C_format' attribute is defined for a netCDF variable. In this case, ncdump will use the 'C_format' attribute to format each value. For example, if floating-point data for the netCDF variable 'Z' is known to be accurate to only three significant digits, it would be appropriate to use the variable attribute
Look at the structure of the data in the netCDF file foo.nc:
Produce an annotated CDL version of the structure and data in the netCDF file foo.nc, using C-style indexing for the annotations:
Output data for only the variables uwind and vwind from the netCDF file foo.nc, and show the floating-point data with only three significant digits of precision:
Produce a fully-annotated (one data value per line) listing of the data for the variable omega, using FORTRAN conventions for indices, and changing the netCDF file name in the resulting CDL file to omega:
Examine the translated DDS for the DAP source from the specified URL:
Without dumping all the data, show the special virtual attributes that indicate performance-related characterisitics of a netCDF-4 file:
ncgen(1), netcdf(3)
For classic, 64-bit offset, 64-bit data, or netCDF-4 classic model data, ncdump generates line breaks after embedded newlines in displaying character data. This is not done for netCDF-4 files, because netCDF-4 supports arrays of real strings of varying length.
Copy a netCDF file, optionally changing format, compression, or chunking in the output.
The nccopy utility copies an input netCDF file in any supported format variant to an output netCDF file, optionally converting the output to any compatible netCDF format variant, compressing the data, or rechunking the data. For example, if built with the netCDF-3 library, a classic CDF-1 file may be copied to a CDF-2 or CDF-5 file, permitting larger variables. If built with the netCDF-4 library, a netCDF classic file may be copied to a netCDF-4 file or to a netCDF-4 classic model file as well, permitting data compression, efficient schema changes, larger variable sizes, and use of other netCDF-4 features.
If no output format is specified, with either -k kind_name or -kind_code, then the output will use the same format as the input, unless the input is classic format and either chunking or compression is specified, in which case the output will be netCDF-4 classic model format. Attempting some kinds of format conversion will result in an error, if the conversion is not possible. For example, an attempt to copy a netCDF-4 file that uses features of the enhanced model, such as groups or variable-length strings, to any of the other kinds of netCDF formats that use the classic model will result in an error.
nccopy also serves as an example of a generic netCDF-4 program, with its ability to read any valid netCDF file and handle nested groups, strings, and user-defined types, including arbitrarily nested compound types, variable-length types, and data of any valid netCDF-4 type.
If DAP support was enabled when nccopy was built, the file name may specify a DAP URL. This may be used to convert data on DAP servers to local netCDF files.
Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a netCDF file of the same type:
Note that the above copy will not be as fast as use of cp or other simple copy utility, because the file is copied using only the netCDF API. If the input file has extra bytes after the end of the netCDF data, those will not be copied, because they are not accessible through the netCDF interface. If the original file was generated in 'No fill' mode so that fill values are not stored for padding for data alignment, the output file may have different padding bytes.
Convert a netCDF-4 classic model file, compressed.nc, that uses compression, to a netCDF-3 file classic.nc:
Note that 'nc3' could be used instead of 'classic'.
Download the variable 'time_bnds' and its associated attributes from an OPeNDAP server and copy the result to a netCDF file named 'tb.nc':
Note that URLs that name specific variables as command-line arguments should generally be quoted, to avoid the shell interpreting special characters such as '?'.
Compress all the variables in the input file foo.nc, a netCDF file of any type, to the output file bar.nc:
If foo.nc was a classic netCDF file, bar.nc will be a netCDF-4 classic model netCDF file, because the classic formats don't support compression. If foo.nc was a netCDF-4 file with some variables compressed using various deflation levels, the output will also be a netCDF-4 file of the same type, but all the variables, including any uncompressed variables in the input, will now use deflation level 1.
Assume the input data includes gridded variables that use time, lat, lon dimensions, with 1000 times by 1000 latitudes by 1000 longitudes, and that the time dimension varies most slowly. Also assume that users want quick access to data at all times for a small set of lat-lon points. Accessing data for 1000 times would typically require accessing 1000 disk blocks, which may be slow.
Reorganizing the data into chunks on disk that have all the time in each chunk for a few lat and lon coordinates would greatly speed up such access. To chunk the data in the input file slow.nc, a netCDF file of any type, to the output file fast.nc, you could use;
to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes. If you had enough memory to contain the output file, you could speed up the rechunking operation significantly by creating the output in memory before writing it to disk on close:
ncdump(1), ncgen(1), netcdf(3)
The ncgen tool generates a netCDF file or a C or FORTRAN program that creates a netCDF dataset. If no options are specified in invoking ncgen, the program merely checks the syntax of the CDL input, producing error messages for any violations of CDL syntax.
The ncgen tool is now is capable of producing netcdf-4 files. It operates essentially identically to the original ncgen.
The CDL input to ncgen may include data model constructs from the netcdf- data model. In particular, it includes new primitive types such as unsigned integers and strings, opaque data, enumerations, and user-defined constructs using vlen and compound types. The ncgen man page should be consulted for more detailed information.
UNIX syntax for invoking ncgen:
where:
-b Create a (binary) netCDF file. If the '-o' option is absent, a default file name will be constructed from the netCDF name (specified after the netcdf keyword in the input) by appending the '.nc' extension. Warning: if a file already exists with the specified name it will be overwritten.
-o netcdf-file Name for the netCDF file created. If this option is specified, it implies the '-b' option. (This option is necessary because netCDF files are direct-access files created with seek calls, and hence cannot be written to standard output.)
-c Generate C source code that will create a netCDF dataset matching the netCDF specification. The C source code is written to standard output. This is only useful for relatively small CDL files, since all the data is included in variable initializations in the generated program. The -c flag is deprecated and the -lc flag should be used intstead.
-f Generate FORTRAN source code that will create a netCDF dataset matching the netCDF specification. The FORTRAN source code is written to standard output. This is only useful for relatively small CDL files, since all the data is included in variable initializations in the generated program. The -f flag is deprecated and the -lf77 flag should be used intstead.
-k The -k file specifies the kind of netCDF file to generate. The arguments to the -k flag can be as follows. 'classic', 'nc3' – Produce a netcdf classic file format file. '64-bit offset', 'nc6' – Produce a netcdf 64 bit classic file format file. '64-bit data (CDF-5), 'nc5' – Produce a CDF-5 format file. 'netCDF-4', 'nc4' – Produce a netcdf-4 format file. 'netCDF-4 classic model', 'nc7' – Produce a netcdf-4 file format, but restricted to netcdf-3 classic CDL input.
Note that the -v flag is a deprecated alias for -k. The code 'nc7' is used as a short form for the unwieldy 'netCDF-4 classic model' because 7=3+4, a mnemonic for the format that uses the netCDF-3 data model for compatibility with the netCDF-4 storage format for performance. The old version format numbers '1', '2', '3', '4', equivalent to the format names 'nc3', 'nc6', 'nc4', or 'nc7' respectively, are also still accepted but deprecated, due to easy confusion between format numbers and format names. Various old format name aliases are also accepted but deprecated, e.g. 'hdf5', 'enhanced-nc3', for 'netCDF-4'.
-l The -l file specifies that ncgen should output (to standard output) the text of a program that, when compiled and executed, will produce the corresponding binary .nc file. The arguments to the -l flag can be as follows. c|C => C language output. f77|fortran77 => FORTRAN 77 language output; note that currently only the classic model is supported for fortran output.
-x Use “no fill” mode, omitting the initialization of variable values with fill values. This can make the creation of large files much faster, but it will also eliminate the possibility of detecting the inadvertent reading of values that haven't been written.
Check the syntax of the CDL file foo.cdl:
From the CDL file foo.cdl, generate an equivalent binary netCDF file named bar.nc:
From the CDL file foo.cdl, generate a C program containing netCDF function invocations that will create an equivalent binary netCDF dataset:
The ncgen3 tool is the new name for the older, original ncgen utility.
The ncgen3 tool generates a netCDF file or a C or FORTRAN program that creates a netCDF dataset. If no options are specified in invoking ncgen3, the program merely checks the syntax of the CDL input, producing error messages for any violations of CDL syntax.
The ncgen3 utility can only generate classic-model netCDF-4 files or programs.
UNIX syntax for invoking ncgen3:
where:
-b Create a (binary) netCDF file. If the '-o' option is absent, a default file name will be constructed from the netCDF name (specified after the netcdf keyword in the input) by appending the '.nc' extension. Warning: if a file already exists with the specified name it will be overwritten.
-o netcdf-file Name for the netCDF file created. If this option is specified, it implies the '-b' option. (This option is necessary because netCDF files are direct-access files created with seek calls, and hence cannot be written to standard output.)
-c Generate C source code that will create a netCDF dataset matching the netCDF specification. The C source code is written to standard output. This is only useful for relatively small CDL files, since all the data is included in variable initializations in the generated program.
-f Generate FORTRAN source code that will create a netCDF dataset matching the netCDF specification. The FORTRAN source code is written to standard output. This is only useful for relatively small CDL files, since all the data is included in variable initializations in the generated program.
-v2 The generated netCDF file or program will use the version of the format with 64-bit offsets, to allow for the creation of very large files. These files are not as portable as classic format netCDF files, because they require version 3.6.0 or later of the netCDF library.
-v3 The generated netCDF file will be in netCDF-4/HDF5 format. These files are not as portable as classic format netCDF files, because they require version 4.0 or later of the netCDF library.
-v5 The generated netCDF file or program will use the version of the format with 64-bit integers, to allow for the creation of very large variables. These files are not as portable as classic format netCDF files, because they require version 4.4.0 or later of the netCDF library.
-x Use “no fill” mode, omitting the initialization of variable values with fill values. This can make the creation of large files much faster, but it will also eliminate the possibility of detecting the inadvertent reading of values that haven't been written.