Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

How to Populate a Property Tree

XML Parser
JSON Parser
INI Parser
INFO Parser

The XML format is an industry standard for storing information in textual form. Unfortunately, there is no XML parser in Boost as of the time of this writing. The library therefore contains the fast and tiny RapidXML parser (currently in version 1.13) to provide XML parsing support. RapidXML does not fully support the XML standard; it is not capable of parsing DTDs and therefore cannot do full entity substitution.

By default, the parser will preserve most whitespace, but remove element content that consists only of whitespace. Encoded whitespaces (e.g.  ) does not count as whitespace in this regard. You can pass the trim_whitespace flag if you want all leading and trailing whitespace trimmed and all continuous whitespace collapsed into a single space.

Please note that RapidXML does not understand the encoding specification. If you pass it a character buffer, it assumes the data is already correctly encoded; if you pass it a filename, it will read the file using the character conversion of the locale you give it (or the global locale if you give it none). This means that, in order to parse a UTF-8-encoded XML file into a wptree, you have to supply an alternate locale, either directly or by replacing the global one.

XML / property tree conversion schema (read_xml and write_xml):

  • Each XML element corresponds to a property tree node. The child elements correspond to the children of the node.
  • The attributes of an XML element are stored in the subkey <xmlattr>. There is one child node per attribute in the attribute node. Existence of the <xmlattr> node is not guaranteed or necessary when there are no attributes.
  • XML comments are stored in nodes named <xmlcomment>, unless comment ignoring is enabled via the flags.
  • Text content is stored in one of two ways, depending on the flags. The default way concatenates all text nodes and stores them as the data of the element node. This way, the entire content can be conveniently read, but the relative ordering of text and child elements is lost. The other way stores each text content as a separate node, all called <xmltext>.

The XML storage encoding does not round-trip perfectly. A read-write cycle loses trimmed whitespace, low-level formatting information, and the distinction between normal data and CDATA nodes. Comments are only preserved when enabled. A write-read cycle loses trimmed whitespace; that is, if the origin tree has string data that starts or ends with whitespace, that whitespace is lost.


PrevUpHomeNext