rpm  5.4.15
format
Go to the documentation of this file.
1 /*! \page pkgformat Package format
2 
3 This document describes the RPM file format version 3.0, which is used
4 by RPM versions 2.1 and greater. The format is subject to change, and
5 you should not assume that this document is kept up to date with the
6 latest RPM code. That said, the 3.0 format should not change for
7 quite a while, and when it does, it will not be 3.0 anymore :-).
8 
9 \warning In any case, THE PROPER WAY TO ACCESS THESE STRUCTURES IS THROUGH
10 THE RPM LIBRARY!!
11 
12 The RPM file format covers both source and binary packages. An RPM
13 package file is divided in 4 logical sections:
14 
15 \verbatim
16 . Lead -- 96 bytes of "magic" and other info
17 . Signature -- collection of "digital signatures"
18 . Header -- holding area for all the package information (aka "metadata")
19 . Payload -- compressed archive of the file(s) in the package (aka "payload")
20 \endverbatim
21 
22 All 2 and 4 byte "integer" quantities (int16 and int32) are stored in
23 network byte order. When data is presented, the first number is the
24 byte number, or address, in hex, followed by the byte values in hex,
25 followed by character "translations" (where appropriate).
26 
27 \subsection pkgformat_lead Lead
28 
29 The Lead is basically for file(1). All the information contained in
30 the Lead is duplicated or superceded by information in the Header.
31 Much of the info in the Lead was used in old versions of RPM but is
32 now ignored. The Lead is stored as a C structure:
33 
34 \code
35 struct rpmlead {
36  unsigned char magic[4];
37  unsigned char major, minor;
38  short type;
39  short archnum;
40  char name[66];
41  short osnum;
42  short signature_type;
43  char reserved[16];
44 };
45 \endcode
46 
47 and is illustrated with one pulled from the rpm-2.1.2-1.i386.rpm
48 package:
49 
50 \verbatim
51 00000000: ed ab ee db 03 00 00 00
52 \endverbatim
53 
54 The first 4 bytes (0-3) are "magic" used to uniquely identify an RPM
55 package. It is used by RPM and file(1). The next two bytes (4, 5)
56 are int8 quantities denoting the "major" and "minor" RPM file format
57 version. This package is in 3.0 format. The following 2 bytes (6-7)
58 form an int16 which indicates the package type. As of this writing
59 there are only two types: 0 == binary, 1 == source.
60 
61 \verbatim
62 00000008: 00 01 72 70 6d 2d 32 2e ..rpm-2.
63 \endverbatim
64 
65 The next two bytes (8-9) form an int16 that indicates the architecture
66 the package was built for. While this is used by file(1), the true
67 architecture is stored as a string in the Header. See, lib/misc.c for
68 a list of architecture->int16 translations. In this case, 1 == i386.
69 Starting with byte 10 and extending to byte 75, are 65 characters and
70 a null byte which contain the familiar "name-version-release" of the
71 package, padded with null (0) bytes.
72 
73 \verbatim
74 00000010: 31 2e 32 2d 31 00 00 00 1.2-1...
75 00000018: 00 00 00 00 00 00 00 00 ........
76 00000020: 00 00 00 00 00 00 00 00 ........
77 00000028: 00 00 00 00 00 00 00 00 ........
78 00000030: 00 00 00 00 00 00 00 00 ........
79 00000038: 00 00 00 00 00 00 00 00 ........
80 00000040: 00 00 00 00 00 00 00 00 ........
81 00000048: 00 00 00 00 00 01 00 05 ........
82 \endverbatim
83 
84 Bytes 76-77 ("00 01" above) form an int16 that indicates the OS the
85 package was built for. In this case, 1 == Linux. The next 2 bytes
86 (78-79) form an int16 that indicates the signature type. This tells
87 RPM what to expect in the Signature. For version 3.0 packages, this
88 is 5, which indicates the new "Header-style" signatures.
89 
90 \verbatim
91 00000050: 04 00 00 00 68 e6 ff bf ........
92 00000058: ab ad 00 08 3c eb ff bf ........
93 \endverbatim
94 
95 The remaining 16 bytes (80-95) are currently unused and are reserved
96 for future expansion.
97 
98 \subsection pkgformat_signature Signature
99 
100 A 3.0 format signature (denoted by signature type 5 in the Lead), uses
101 the same structure as the Header. For historical reasons, this
102 structure is called a "header structure", which can be confusing since
103 it is used for both the Header and the Signature. The details of the
104 header structure are given below, and you'll want to read them so the
105 rest of this makes sense. The tags for the Signature are defined in
106 lib/signature.h.
107 
108 The Signature can contain multiple signatures, of different types.
109 There are currently only three types, each with its own tag in the
110 header structure:
111 
112 \verbatim
113  Name Tag Header Type
114  ---- ---- -----------
115  SIZE 1000 INT_32
116  MD5 1001 BIN
117  PGP 1002 BIN
118 \endverbatim
119 
120 The MD5 signature is 16 bytes, and the PGP signature varies with
121 the size of the PGP key used to sign the package.
122 
123 As of RPM 2.1, all packages carry at least SIZE and MD5 signatures,
124 and the Signature section is padded to a multiple of 8 bytes.
125 
126 \subsection pkgformat_header Header
127 
128 The Header contains all the information about a package: name,
129 version, file list, etc. It uses the same "header structure" as the
130 Signature, which is described in detail below. A complete list of the
131 tags for the Header would take too much space to list here, and the
132 list grows fairly frequently. For the complete list see lib/rpmlib.h
133 in the RPM sources.
134 
135 \subsection pkgformat_payload Payload
136 
137 The Payload is currently a gzipped cpio archive. The cpio
138 archive type used is SVR4 with a CRC checksum.
139 
140 \subsection pkgformat_header_structure The Header Structure
141 
142 The header structure is a little complicated, but actually performs a
143 very simple function. It acts almost like a small database in that it
144 allows you to store and retrieve arbitrary data with a key called a
145 "tag". When a header structure is written to disk, the data is
146 written in network byte order, and when it is read from disk, is is
147 converted to host byte order.
148 
149 Along with the tag and the data, a data "type" is stored, which indicates,
150 obviously, the type of the data associated with the tag. There are
151 currently 9 types:
152 
153 \verbatim
154  Type Number
155  ---- ------
156  NULL 0
157  CHAR 1
158  INT8 2
159  INT16 3
160  INT32 4
161  INT64 5
162  STRING 6
163  BIN 7
164  STRING_ARRAY 8
165  I18NSTRING_TYPE 9
166 \endverbatim
167 
168 One final piece of information is a "count" which is stored with each
169 tag, and indicates the number of items of the associated type that are
170 stored. As a special case, the STRING type is not allowed to have a
171 count greater than 1. To store more than one string you must use a
172 STRING_ARRAY.
173 
174 Altogether, the tag, type, count, and data are called an "Entry" or
175 "Header Entry".
176 
177 \verbatim
178 00000000: 8e ad e8 01 00 00 00 00 ........
179 \endverbatim
180 
181 A header begins with 3 bytes of magic "8e ad e8" and a single byte to
182 indicate the header version. The next four bytes (4-7) are reserved.
183 
184 \verbatim
185 00000008: 00 00 00 20 00 00 07 77 ........
186 \endverbatim
187 
188 The next four bytes (8-11) form an int32 that is a count of the number
189 of entries stored (in this case, 32). Bytes 12-15 form an int32 that
190 is a count of the number of bytes of data stored (that is, the number
191 of bytes made up by the data portion of each entry). In this case it
192 is 1911 bytes.
193 
194 \verbatim
195 00000010: 00 00 03 e8 00 00 00 06 00 00 00 00 00 00 00 01 ................
196 \endverbatim
197 
198 Following the first 16 bytes is the part of the header called the
199 "index". The index is made of up "index entries", one for each entry
200 in the header. Each index entry contains four int32 quantities. In
201 order, they are: tag, type, offset, count. In the above example, we
202 have tag=1000, type=6, offset=0, count=1. By looking up the the tag
203 in lib/rpmlib.h we can see that this entry is for the package name.
204 The type of the entry is a STRING. The offset is an offset from the
205 start of the data part of the header to the data associated with this
206 entry. The count indicates that there is only one string associated
207 with the entry (which we really already knew since STRING types are
208 not allowed to have a count greater than 1).
209 
210 In our example there would be 32 such 16-byte index entries, followed
211 by the data section:
212 
213 \verbatim
214 00000210: 72 70 6d 00 32 2e 31 2e 32 00 31 00 52 65 64 20 rpm.2.1.2.1.Red
215 00000220: 48 61 74 20 50 61 63 6b 61 67 65 20 4d 61 6e 61 Hat Package Mana
216 00000230: 67 65 72 00 31 e7 cb b4 73 63 68 72 6f 65 64 65 ger.1...schroede
217 00000240: 72 2e 72 65 64 68 61 74 2e 63 6f 6d 00 00 00 00 r.redhat.com....
218 ...
219 00000970: 6c 69 62 63 2e 73 6f 2e 35 00 6c 69 62 64 62 2e libc.so.5.libdb.
220 00000980: 73 6f 2e 32 00 00 so.2..
221 \endverbatim
222 
223 The data section begins at byte 528 (4 magic, 4 reserved, 4 index
224 entry count, 4 data byte count, 16 * 32 index entries). At offset 0,
225 bytes 528-531 are "rpm" plus a null byte, which is the data for the
226 first index entry (the package name). Following is is the data for
227 each of the other entries. Each string is null terminated, the strings
228 in a STRING_ARRAY are also null terminated and are place one after
229 another. The integer types are aligned to appropriate byte boundaries,
230 so that the data of INT64 type starts on an 8 byte boundary, INT32
231 type starts on a 4 byte boundary, and an INT16 type starts on a 2 byte
232 boundary. For example:
233 
234 \verbatim
235 00000060: 00 00 03 ef 00 00 00 06 00 00 00 28 00 00 00 01 ................
236 00000070: 00 00 03 f1 00 00 00 04 00 00 00 40 00 00 00 01 ................
237 ...
238 00000240: 72 2e 72 65 64 68 61 74 2e 63 6f 6d 00 00 00 00 r.redhat.com....
239 00000250: 00 09 9b 31 52 65 64 20 48 61 74 20 4c 69 6e 75 ....Red Hat Linu
240 \endverbatim
241 
242 Index entry number 6 is the BUILDHOST, of type STRING. Index entry
243 number 7 is the SIZE, of type INT32. The corresponding data for entry
244 6 end at byte 588 with "....redhat.com\0". The next piece of data
245 could start at byte 589, byte that is an improper boundary for an INT32.
246 As a result, 3 null bytes are inserted and the date for the SIZE actually
247 starts at byte 592: "00 09 9b 31", which is 629553).
248 
249 \subsection pkgformat_tools Tools
250 
251 The tools directory in the RPM sources contains a number of small
252 programs that use the RPM library to pick apart packages. These
253 tools are mostly used for debugging, but can also be used to help
254 you understand the internals of the RPM package format.
255 
256 \verbatim
257  rpmlead - extracts the Lead from a package
258  rpmsignature - extracts the Signature from a package
259  rpmheader - extracts the Header from a package
260  rpmarchive - extracts the Archive from a package
261  dump - displays a header structure in readable format
262 \endverbatim
263 
264 Given a package foo.rpm you might try:
265 
266 \verbatim
267  rpmlead foo.rpm | od -x
268  rpmsignature foo.rpm | dump
269  rpmheader foo.rpm | dump
270  rpmarchive foo.rpm | zcat | cpio --list
271 \endverbatim
272 
273 */