class DjVuTXT: public GPEnabled

Description of the text contained in a DjVu page.

Inheritance:


Public Classes

[more]enum ZoneType
These constant are used to tell what a zone describes.
[more]struct Zone
Data structure representing document textual components.

Public Fields

[more]GString textUTF8
Textual data for this page.
[more]Zone page_zone
Main zone in the document.

Public Methods

[more]int has_valid_zones() const
Tests whether there is a meaningful zone hierarchy.
[more]void normalize_text()
Normalize textual data.
[more]void encode(ByteStream &bs) const
Encode data for a TXT chunk.
[more]void decode(ByteStream &bs)
Decode data from a TXT chunk.
[more]GP<DjVuTXT> copy(void) const
Returns a copy of this object.
[more]GList<Zone *> search_string(const char * string, int & start_pos, bool search_fwd, bool match_case, bool whole_word=false) const
Searches the TXT chunk for the given string and returns a list of the smallest zones covering the text.
[more]unsigned int get_memory_usage() const
Returns the number of bytes needed by this data structure.


Inherited from GPEnabled:

Public Methods

oGPEnabled& operator=(const GPEnabled & obj)
oint get_count(void) const

Protected Fields

ovolatile int count


Documentation

Description of the text contained in a DjVu page. This class contains the textual data for the page. It describes the text as a hierarchy of zones corresponding to page, column, region, paragraph, lines, words, etc... The piece of text associated with each zone is represented by an offset and a length describing a segment of a global UTF8 encoded string.
oenum ZoneType
These constant are used to tell what a zone describes. This can be useful for a copy/paste application. The deeper we go into the hierarchy, the higher the constant.

ostruct Zone
Data structure representing document textual components. The text structure is represented by a hierarchy of rectangular zones.

oenum ZoneType ztype
Type fo the zone.

oGRect rect
Rectangle spanned by the zone

oint text_start
Position of the zone text in string textUTF8.

oint text_length
Length of the zone text in string textUTF8.

oGList<Zone> children
List of children zone.

oZone* append_child()
Appends another subzone inside this zone. The new zone is initialized with an empty rectangle, empty text, and has the same type as this zone.

oGString textUTF8
Textual data for this page. The content of this string is encoded using the UTF8 code. This code corresponds to ASCII for the first 127 characters. Columns, regions, paragraph and lines are delimited by the following control character:
Name Octal Ascii name
DjVuText::end_of_column 013 VT, Vertical Tab
DjVuText::end_of_region 035 GS, Group Separator
DjVuText::end_of_paragraph 037 US, Unit Separator
DjVuText::end_of_line 012 LF: Line Feed

oZone page_zone
Main zone in the document. This zone represent the page.

oint has_valid_zones() const
Tests whether there is a meaningful zone hierarchy.

ovoid normalize_text()
Normalize textual data. Assuming that a zone hierarchy has been built and represents the reading order. This function reorganizes the string textUTF8 by gathering the highest level text available in the zone hierarchy. The text offsets and lengths are recomputed for all the zones in the hierarchy. Separators are inserted where appropriate.

ovoid encode(ByteStream &bs) const
Encode data for a TXT chunk.

ovoid decode(ByteStream &bs)
Decode data from a TXT chunk.

oGP<DjVuTXT> copy(void) const
Returns a copy of this object.

oGList<Zone *> search_string(const char * string, int & start_pos, bool search_fwd, bool match_case, bool whole_word=false) const
Searches the TXT chunk for the given string and returns a list of the smallest zones covering the text.
Parameters:
string - String to be found. May contain spaces as word separators.
start_pos - Position where to start searching. It may be negative or it may be bigger than the length of the textUTF8 string. If the start_pos is out of bounds, it will be fixed before starting the search
  • If start_pos is negative and we search forward, the start_pos will be reset to 0.
  • If start_pos is too big and we search backward, the start_pos will be reset to the textUTF8.length()-1.
  • Otherwise the start_pos will remain unchanged, and nothing will be found.
If the function manages to find an occurrence of the string, it will modify the start_pos to point to it. If no match has been found, the start_pos will be reset to some big number if searching forward and -1 otherwise.
search_fwd - TRUE means to search forward. FALSE - backward.
match_case - If set to FALSE the search will be case-insensitive.
whole_word - If set to TRUE the function will try to find a whole word matching the passed string. The word separators are all blank and punctuation characters. The passed string may not contain word separators, that is it must be a whole word. WARNING: The returned list contains pointers to Zones. DO NOT DELETE these Zones.

ounsigned int get_memory_usage() const
Returns the number of bytes needed by this data structure. It's used by caching routines to estimate the size of a DjVuImage.


This class has no child classes.

Alphabetic index HTML hierarchy of classes or Java


DjVu is a trademark of LizardTech, Inc.
All other products mentioned are registered trademarks or trademarks of their respective companies.