1 NetCDF Byterange Support {#netcdf_byterange}
2 ================================
5 <!-- Note that this file has the .dox extension, but is mostly markdown -->
6 <!-- Begin MarkDown -->
8 # Introduction {#byterange_intro}
10 Suppose that you have the URL to a remote dataset
11 which is a normal netcdf-3 or netcdf-4 file.
13 The netCDF-c library now supports read-only access to such
14 datasets using the HTTP byte range capability [], assuming that
15 the remote server supports byte-range access.
19 1. An Amazon S3 object containing a netcdf classic file.
20 - location: "https://remotetest.unidata.ucar.edu/thredds/fileServer/testdata/2004050300_eta_211.nc#mode=bytes"
21 2. A Thredds Server dataset supporting the Thredds HTTPServer protocol.
22 and containing a netcdf enhanced file.
23 - location: "http://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadC/2017/059/03/OR_ABI-L1b-RadC-M3C13_G16_s20170590337505_e20170590340289_c20170590340316.nc#mode=bytes"
25 Other remote servers may also provide byte-range access in a similar form.
27 It is important to note that this is not intended as a true
28 production capability because it is believed that this kind of access
29 can be quite slow. In addition, the byte-range IO drivers do not
30 currently do any sort of optimization or caching.
32 # Configuration {#byterange_config}
34 This capability is enabled using the option *--enable-byterange* option
35 to the *./configure* command for Automake. For Cmake, the option flag is
36 *-DENABLE_BYTERANGE=true*.
38 This capability requires access to *libcurl*, and an error will occur
39 if byterange is enabled, but no *libcurl* could not be located.
40 In this, it is similar to the DAP2 and DAP4 capabilities.
42 Note also that here, the term "http" is often used as a synonym for *byterange*.
44 # Run-time Usage {#byterange_url}
46 In order to use this capability at run-time, with *ncdump* for
47 example, it is necessary to provide a URL pointing to the basic
48 dataset to be accessed. The URL must be annotated to tell the
49 netcdf-c library that byte-range access should be used. This is
50 indicated by appending the phrase ````#mode=bytes````
51 to the end of the URL.
52 The two examples above show how this will look.
54 In order to determine the kind of file being accessed, the
55 netcdf-c library will read what is called the "magic number"
56 from the beginning of the remote dataset. This magic number
57 is a specific set of bytes that indicates the kind of file:
58 classic, enhanced, cdf5, etc.
60 # Architecture {#byterange_arch}
62 Internally, this capability is implemented with three files:
64 1. libdispatch/dhttp.c -- wrap libcurl operations.
65 2. libsrc/httpio.c -- provide byte-range reading to the netcdf-3 dispatcher.
66 3. libhdf5/H5FDhttp.c -- provide byte-range reading to the netcdf-4 dispatcher.
68 Both *httpio.c* and *H5FDhttp.c* are adapters that use *dhttp.c*
69 to do the work. Testing for the magic number is also carried out
70 by using the *dhttp.c* code.
72 ## NetCDF Classic Access
74 The netcdf-3 code in the directory *libsrc* is built using
75 a secondary dispatch mechanism called *ncio*. This allows the
76 netcdf-3 code be independent of the lowest level IO access mechanisms.
77 This is how in-memory and mmap based access is implemented.
78 The file *httpio.c* is the dispatcher used to provide byte-range
79 IO for the netcdf-3 code.
81 Note that *httpio.c* is mostly just an
82 adapter between the *ncio* API and the *dhttp.c* code.
84 ## NetCDF Enhanced Access
86 Similar to the netcdf-3 code, the HDF5 library
87 provides a secondary dispatch mechanism *H5FD*. This allows the
88 HDF5 code to be independent of the lowest level IO access mechanisms.
89 The netcdf-4 code in libhdf5 is built on the HDF5 library, so
90 it indirectly inherits the H5FD mechanism.
92 The file *H5FDhttp.c* implements the H5FD dispatcher API
93 and provides byte-range IO for the netcdf-4 code
94 (and for the HDF5 library as a side effect).
96 Note that *H5FDhttp.c* is mostly just an
97 adapter between the *H5FD* API and the *dhttp.c* code.
99 # The dhttp.c Code {#byterange_dhttp}
101 The core of all this is *dhttp.c* (and its header
102 *include/nchttp.c*). It is a wrapper over *libcurl*
103 and so exposes the libcurl handles -- albeit as _void*_.
105 The API for *dhttp.c* consists of the following procedures:
106 - int nc_http_open(const char* objecturl, void** curlp, fileoffset_t* filelenp);
107 - int nc_http_read(void* curl, const char* url, fileoffset_t start, fileoffset_t count, NCbytes* buf);
108 - int nc_http_close(void* curl);
109 - typedef long long fileoffset_t;
111 The type *fileoffset_t* is used to avoid use of *off_t* or *off64_t*
112 which are too volatile. It is intended to be represent file lengths
116 The *nc_http_open* procedure creates a *Curl* handle and returns it
117 in the *curlp* argument. It also obtains and searches the headers
118 looking for two headers:
120 1. "Accept-Ranges: bytes" -- to verify that byte-range access is supported.
121 2. "Content-Length: ..." -- to obtain the size of the remote dataset.
123 The dataset length is returned in the *filelenp* argument.
127 The *nc_http_read* procedure reads a specified set of contiguous bytes
128 as specified by the *start* and *count* arguments. It takes the *Curl*
129 handle produced by *nc_http_open* to indicate the server from which to read.
131 The *buf* argument is a pointer to an instance of type *NCbytes*, which
132 is a dynamically expandable byte vector (see the file *include/ncbytes.h*).
134 This procedure reads *count* bytes from the remote dataset starting at
135 the offset *start* position. The bytes are stored in *buf*.
139 The *nc_http_close* function closes the *Curl* handle and does any
142 # Point of Contact {#byterange_poc}
144 __Author__: Dennis Heimbigner<br>
145 __Email__: dmh at ucar dot edu<br>
146 __Initial Version__: 12/30/2018<br>
147 __Last Revised__: 12/30/2018
149 <!-- End MarkDown -->