Public Member Functions | |
~Pattern () | |
std::string | replace (const std::string &str, const std::string &replacementText) |
std::vector< std::string > | split (const std::string &str, const bool keepEmptys=0, const unsigned long limit=0) |
std::vector< std::string > | findAll (const std::string &str) |
bool | matches (const std::string &str) |
unsigned long | getFlags () const |
std::string | getPattern () const |
Matcher * | createMatcher (const std::string &str) |
Static Public Member Functions | |
static Pattern * | compile (const std::string &pattern, const unsigned long mode=0) |
static Pattern * | compileAndKeep (const std::string &pattern, const unsigned long mode=0) |
static std::string | replace (const std::string &pattern, const std::string &str, const std::string &replacementText, const unsigned long mode=0) |
static std::vector< std::string > | split (const std::string &pattern, const std::string &str, const bool keepEmptys=0, const unsigned long limit=0, const unsigned long mode=0) |
static std::vector< std::string > | findAll (const std::string &pattern, const std::string &str, const unsigned long mode=0) |
static bool | matches (const std::string &pattern, const std::string &str, const unsigned long mode=0) |
static bool | registerPattern (const std::string &name, const std::string &pattern, const unsigned long mode=0) |
static void | unregisterPatterns () |
static void | clearPatternCache () |
static std::pair< std::string, int > | findNthMatch (const std::string &pattern, const std::string &str, const int matchNum, const unsigned long mode=0) |
Static Public Attributes | |
static const unsigned long | CASE_INSENSITIVE = 0x01 |
We should match regardless of case. | |
static const unsigned long | LITERAL = 0x02 |
We are implicitly quoted. | |
static const unsigned long | DOT_MATCHES_ALL = 0x04 |
We should treat a . as [-] | |
static const unsigned long | MULTILINE_MATCHING = 0x08 |
static const unsigned long | UNIX_LINE_MODE = 0x10 |
static const int | MIN_QMATCH = 0x00000000 |
The absolute minimum number of matches a quantifier can match (0). | |
static const int | MAX_QMATCH = 0x7FFFFFFF |
The absolute maximum number of matches a quantifier can match (0x7FFFFFFF). | |
Protected Member Functions | |
void | raiseError () |
NFANode * | registerNode (NFANode *node) |
std::string | classUnion (std::string s1, std::string s2) const |
std::string | classIntersect (std::string s1, std::string s2) const |
std::string | classNegate (std::string s1) const |
std::string | classCreateRange (char low, char hi) const |
int | getInt (int start, int end) |
bool | quantifyCurly (int &sNum, int &eNum) |
NFANode * | quantifyGroup (NFANode *start, NFANode *stop, const int gn) |
NFANode * | quantify (NFANode *newNode) |
std::string | parseClass () |
std::string | parsePosix () |
std::string | parseOctal () |
std::string | parseHex () |
NFANode * | parseBackref () |
std::string | parseEscape (bool &inv, bool &quo) |
NFANode * | parseRegisteredPattern (NFANode **end) |
NFANode * | parseBehind (const bool pos, NFANode **end) |
NFANode * | parseQuote () |
NFANode * | parse (const bool inParen=0, const bool inOr=0, NFANode **end=NULL) |
Protected Attributes | |
std::map< NFANode *, bool > | nodes |
Matcher * | matcher |
NFANode * | head |
std::string | pattern |
bool | error |
int | curInd |
int | groupCount |
int | nonCapGroupCount |
unsigned long | flags |
Static Protected Attributes | |
static std::map< std::string, Pattern * > | compiledPatterns |
static std::map< std::string, std::pair< std::string, unsigned long > > | registeredPatterns |
Friends | |
class | Matcher |
class | NFANode |
class | NFAQuantifierNode |
The Pattern class works primarily off of "compiled" patterns. A typical instantiation of a regular expression looks like:
Pattern * p = Pattern::compile("a*b"); Matcher * m = p->createMatcher("aaaaaab"); if (m->matches()) ...
However, if you do not need to use a pattern more than once, it is often times okay to use the Pattern's static methods insteads. An example looks like this:
if (Pattern::matches("a*b", "aaaab")) { ... }
This class does not currently support unicode. The unicode update for this class is coming soon.
This class is partially immutable. It is completely safe to call createMatcher concurrently in different threads, but the other functions (e.g. split) should not be called concurrently on the same Pattern
.
Construct | Matches |
Characters | |
x | The character x |
\ | The character </code> |
| The character with octal ASCII value |
| The character with octal ASCII value |
| The character with hexadecimal ASCII value |
| A tab character |
| A carriage return character |
| A new-line character |
| |
Character Classes | |
| Either |
| Any character but |
| Any character ranging from |
| Any character except those ranging from |
| Either |
| Same as |
| Any character in the intersection of |
| Any character in |
| |
Prefefined character classes | |
| Any character. Multiline matching must be compiled into the pattern for |
| |
| |
| |
| |
| |
| |
| |
POSIX character classes | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Boundary Matches | |
| The beginning of a line. Also matches the beginning of input. |
| The end of a line. Also matches the end of input. |
| A word boundary |
| A non word boundary |
| The beginning of input |
| The end of the previous match. Ensures that a "next" match will only happen if it begins with the character immediately following the end of the "current" match. |
| The end of input. Will also match if there is a single trailing |
| The end of input |
| |
Greedy Quantifiers | |
| x, either zero times or one time |
| x, zero or more times |
| x, one or more times |
| x, exactly n times |
| x, at least |
| x, at most |
| x, at least |
| |
Possessive Quantifiers | |
| x, either zero times or one time |
| x, zero or more times |
| x, one or more times |
| x, exactly n times |
| x, at least |
| x, at most |
| x, at least |
| |
Reluctant Quantifiers | |
| x, either zero times or one time |
| x, zero or more times |
| x, one or more times |
| x, exactly n times |
| x, at least |
| x, at most |
| x, at least |
| |
Operators | |
| |
| |
| |
| |
Quoting | |
| Nothing, but treat every character (including ) literally until a matching |
| Nothing, but ends its matching |
| |
Special Constructs | |
| |
| |
| |
| |
| |
| |
| |
Registered Expression Matching | |
| The registered pattern |
Begin Text Extracted And Modified From java.util.regex.Pattern documentation
Backslashes, escapes, and quoting
The backslash character (
'\'
) serves to introduce escaped constructs, as defined in the table above, as well as to quote characters that otherwise would be interpreted as unescaped constructs. Thus the expression \
matches a single backslash and matches a left brace.
It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.
It is necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by a compiler. The string literal
"\b"
, for example, matches a single backspace character when interpreted as a regular expression, while "\\b"
matches a word boundary. The string litera "\(hello\)"
is illegal and leads to a compile-time error; in order to match the string (hello)
the string literal "\\(hello\\)"
must be used.
Character Classes
Character classes may appear within other character classes, and may be composed by the union operator (implicit) and the intersection operator (
&&
). The union operator denotes a class that contains every character that is in at least one of its operand classes. The intersection operator denotes a class that contains every character that is in both of its operand classes.
The precedence of character-class operators is as follows, from highest to lowest:
<blockquote>
</blockquote>
1 Literal escape
2 Range a-z
3 Grouping [...]
4 Intersection [a-z&&[aeiou]]
5 Union [a-e][i-u]
Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression
.
loses its special meaning inside a character class, while the expression -
becomes a range forming metacharacter.
Capturing groups are numbered by counting their opening parentheses from left to right. In the expression
((A)(B(C)))
, for example, there are four such groups:
<blockquote>
</blockquote>
1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)
Group zero always stands for the entire expression.
Capturing groups are so named because, during a match, each subsequence of the input sequence that matches such a group is saved. The captured subsequence may be used later in the expression, via a back reference, and may also be retrieved from the matcher once the match operation is complete.
The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string
"aba"
against the expression (a(b)?)+
, for example, leaves group two set to "b"
. All captured input is discarded at the beginning of each match.
Groups beginning with
(?
are pure, non-capturing groups that do not capture text and do not count towards the group total.
Unicode support
Coming Soon.
Comparison to Perl 5
The
Pattern
engine performs traditional NFA-based matching with ordered alternation as occurs in Perl 5.
Perl constructs not supported by this class:
The conditional constructs
The embedded code constructs
The embedded comment syntax
The preprocessing operations
Embedded flags
(?{
X})
and (?(
condition)
X|
Y)
, (?{
code})
and (??{
code})
,(?comment)
, and
\u
, , and
.
Constructs supported by this class but not by Perl:
Possessive quantifiers, which greedily match as much as they can and do not back off, even when doing so would allow the overall match to succeed.
Character-class union and intersection as described above.
Notable differences from Perl:
In Perl,
Perl uses the
Perl is forgiving about malformed matching constructs, as in the expression
through
are always interpreted as back references; a backslash-escaped number greater than
9
is treated as a back reference if at least that many subexpressions exist, otherwise it is interpreted, if possible, as an octal escape. In this class octal escapes must always begin with a zero. In this class, through
are always interpreted as back references, and a larger number is accepted as a back reference if at least that many subexpressions exist at that point in the regular expression, otherwise the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit.
g
flag to request a match that resumes where the last match left off. This functionality is provided implicitly by the Matcher
class: Repeated invocations of the find
method will resume where the last match left off, unless the matcher is reset. *a
, as well as dangling brackets, as in the expression abc]
, and treats them as literals. This class also strict and will not compile a pattern when dangling characters are encountered.
For a more precise description of the behavior of regular expression constructs, please see Mastering Regular Expressions, 2nd Edition, Jeffrey E. F. Friedl, O'Reilly and Associates, 2002.
End Text Extracted And Modified From java.util.regex.Pattern documentation
Pattern::~Pattern | ( | ) |
std::string Pattern::classCreateRange | ( | char | low, | |
char | hi | |||
) | const [protected] |
Creates a new "class" representing the range from low
thru hi
. This function will wrap if low
> hi
. This is a feature, not a buf. Sometimes it is useful to be able to say [-] instead of [--].
low | The beginning character | |
hi | The ending character |
Referenced by parseClass().
std::string Pattern::classIntersect | ( | std::string | s1, | |
std::string | s2 | |||
) | const [protected] |
Calculates the intersection of two strings. This function will first sort the strings and then use a simple selection algorithm to find the intersection.
s1 | The first "class" to intersect | |
s2 | The second "class" to intersect |
s1
and s2
. Referenced by parseClass().
std::string Pattern::classNegate | ( | std::string | s1 | ) | const [protected] |
Calculates the negation of a string. The negation is the set of all characters between and
not contained in
s1
.
s1 | The "class" to be negated. | |
s2 | The second "class" to intersect |
s1
and s2
. Referenced by parseClass().
std::string Pattern::classUnion | ( | std::string | s1, | |
std::string | s2 | |||
) | const [protected] |
Calculates the union of two strings. This function will first sort the strings and then use a simple selection algorithm to find the union.
s1 | The first "class" to union | |
s2 | The second "class" to union |
s1
and s2
. Referenced by parseClass().
void Pattern::clearPatternCache | ( | ) | [static] |
Don't use
References compiledPatterns.
Pattern * Pattern::compile | ( | const std::string & | pattern, | |
const unsigned long | mode = 0 | |||
) | [static] |
Call this function to compile a regular expression into a Pattern
object. Special values can be assigned to mode
when certain non-standard behaviors are expected from the Pattern
object.
pattern | The regular expression to compile | |
mode | A bitwise or of flags signalling what special behaviors are wanted from this Pattern object |
compile
returns a Pattern
pointer. Upon failure, compile
returns NULL
References CASE_INSENSITIVE, flags, head, LITERAL, matcher, parse(), and registerNode().
Referenced by compileAndKeep(), findAll(), findNthMatch(), highlight::LanguageDefinition::load(), matches(), registerPattern(), replace(), highlight::LanguageDefinition::restoreLangEndDelim(), and split().
Pattern * Pattern::compileAndKeep | ( | const std::string & | pattern, | |
const unsigned long | mode = 0 | |||
) | [static] |
Dont use this function. This function will compile a pattern, and cache the result. This will eventually be used as an optimization when people just want to call static methods using the same pattern over and over instead of first compiling the pattern and then using the compiled instance for matching.
pattern | The regular expression to compile | |
mode | A bitwise or of flags signalling what special behaviors are wanted from this Pattern object |
compileAndKeep
returns a Pattern
pointer. Upon failure, compile
returns NULL
. References compile(), and compiledPatterns.
Matcher * Pattern::createMatcher | ( | const std::string & | str | ) |
Creates a matcher object using the specified string and this pattern.
str | The string to match against |
std::vector< std::string > Pattern::findAll | ( | const std::string & | pattern, | |
const std::string & | str, | |||
const unsigned long | mode = 0 | |||
) | [static] |
Finds all the instances of the specified pattern within the string. You should be careful to only pass patterns with a minimum length of one. For example, the pattern a*
can be matched by an empty string, so instead you should pass a+
since at least one character must be matched. A typical invocation of findAll
looks like:
std::vector<td::string> numbers = Pattern::findAll("\\d+", string);
pattern | The pattern for which to search | |
str | The string to search | |
mode | The special mode requested of the Pattern during the find process |
pattern
in str
References compile(), and findAll().
Referenced by findAll().
std::pair< std::string, int > Pattern::findNthMatch | ( | const std::string & | pattern, | |
const std::string & | str, | |||
const int | matchNum, | |||
const unsigned long | mode = 0 | |||
) | [static] |
Searches through a string for the nth
match of the given pattern in the string. Match indeces start at zero, not one. A typical invocation looks like this:
std::pair<std::string, int> match = Pattern::findNthMatch("\\d{1,3}", "192.168.1.101:22", 1);
printf("%s %i\n", match.first.c_str(), match.second);
Output: 168 4
pattern The pattern for which to search str The string to search matchNum Which match to find mode Any special flags to use during the matching process str
. You can check for success/failure by making sure that the integer returned is greater than or equal to zero.
References compile(), Matcher::findNextMatch(), Matcher::getGroup(), Matcher::getStartingIndex(), matcher, and Matcher::setString().
unsigned long Pattern::getFlags | ( | ) | const |
Returns the flags used during compilation of this pattern
References flags.
int Pattern::getInt | ( | int | start, | |
int | end | |||
) | [protected] |
std::string Pattern::getPattern | ( | ) | const |
Returns the regular expression this pattern represents
References pattern.
bool Pattern::matches | ( | const std::string & | pattern, | |
const std::string & | str, | |||
const unsigned long | mode = 0 | |||
) | [static] |
Determines if an entire string matches the specified pattern
pattern | The pattern for to match | |
str | The string to match | |
mode | The special mode requested of the Pattern during the replacement process |
str
is recognized by pattern
References compile(), and matches().
Referenced by matches().
NFANode * Pattern::parse | ( | const bool | inParen = 0 , |
|
const bool | inOr = 0 , |
|||
NFANode ** | end = NULL | |||
) | [protected] |
Parses pattern
. This function is called recursively when an or (|
) or a group is encountered.
inParen | Are we currently parsing inside a group | |
inOr | Are we currently parsing one side of an or (| ) | |
end | The end of the current expression |
References CASE_INSENSITIVE, curInd, DOT_MATCHES_ALL, error, flags, groupCount, MULTILINE_MATCHING, nonCapGroupCount, parseBackref(), parseBehind(), parseClass(), parseEscape(), parseQuote(), parseRegisteredPattern(), pattern, quantify(), quantifyGroup(), raiseError(), registerNode(), and UNIX_LINE_MODE.
Referenced by compile(), and parseRegisteredPattern().
NFANode * Pattern::parseBackref | ( | ) | [protected] |
Returns a new node representing the back reference being parsed
References curInd, groupCount, pattern, raiseError(), and registerNode().
Referenced by parse().
NFANode * Pattern::parseBehind | ( | const bool | pos, | |
NFANode ** | end | |||
) | [protected] |
Parses a lookbehind expression. Appends the necessary nodes *end
.
pos | Positive or negative look behind | |
end | The ending node of the current pattern |
References curInd, pattern, raiseError(), and registerNode().
Referenced by parse().
std::string Pattern::parseClass | ( | ) | [protected] |
Parses the current class being examined in pattern
.
References classCreateRange(), classIntersect(), classNegate(), classUnion(), curInd, parseEscape(), pattern, and raiseError().
Referenced by parse().
std::string Pattern::parseEscape | ( | bool & | inv, | |
bool & | quo | |||
) | [protected] |
Parses the escape sequence currently being examined. Determines if the escape sequence is a class, a single character, or the beginning of a quotation sequence.
inv | Output parameter. Whether or not to invert the returned class | |
quo | Output parameter. Whether or not this sequence starts a quotation. |
References curInd, parseHex(), parseOctal(), parsePosix(), pattern, and raiseError().
Referenced by parse(), and parseClass().
std::string Pattern::parseHex | ( | ) | [protected] |
Returns a string containing the hex character being parsed
References curInd, and pattern.
Referenced by parseEscape().
std::string Pattern::parseOctal | ( | ) | [protected] |
Returns a string containing the octal character being parsed
References curInd, pattern, and raiseError().
Referenced by parseEscape().
std::string Pattern::parsePosix | ( | ) | [protected] |
Parses the current POSIX class being examined in pattern
.
References curInd, pattern, and raiseError().
Referenced by parseEscape().
NFANode * Pattern::parseQuote | ( | ) | [protected] |
Parses the current expression and tacks on nodes until a is found.
References CASE_INSENSITIVE, curInd, flags, pattern, raiseError(), and registerNode().
Referenced by parse().
NFANode * Pattern::parseRegisteredPattern | ( | NFANode ** | end | ) | [protected] |
Parses a supposed registered pattern currently under compilation. If the sequence of characters does point to a registered pattern, then the registered pattern is appended to *end
. The registered pattern is parsed with the current compilation flags.
end The ending node of the thus-far compiled pattern
References curInd, error, flags, groupCount, parse(), pattern, raiseError(), and registeredPatterns.
Referenced by parse().
NFANode * Pattern::quantify | ( | NFANode * | newNode | ) | [protected] |
Tries to quantify the last parsed expression. If the character was indeed quantified, then the NFA is modified accordingly.
newNode | The recently created expression node |
return value != newNode
References curInd, MAX_QMATCH, MIN_QMATCH, pattern, quantifyCurly(), and registerNode().
Referenced by parse().
bool Pattern::quantifyCurly | ( | int & | sNum, | |
int & | eNum | |||
) | [protected] |
Parses a {n,m}
string out of the member-variable pattern
stores the result in
sNum
and eNum
.
sNum Output parameter. The minimum number of matches required by the curly quantifier are stored here. eNum Output parameter. The maximum number of matches allowed by the curly quantifier are stored here.
References curInd, getInt(), MAX_QMATCH, MIN_QMATCH, pattern, and raiseError().
Referenced by quantify(), and quantifyGroup().
NFANode * Pattern::quantifyGroup | ( | NFANode * | start, | |
NFANode * | stop, | |||
const int | gn | |||
) | [protected] |
Tries to quantify the currently parsed group. If the group being parsed is indeed quantified in the member-variable pattern
, then the NFA is modified accordingly.
start The starting node of the current group being parsed stop The ending node of the current group being parsed gn The group number of the current group being parsed
References curInd, MAX_QMATCH, MIN_QMATCH, pattern, quantifyCurly(), and registerNode().
Referenced by parse().
void Pattern::raiseError | ( | ) | [protected] |
Raises an error during compilation. Compilation will cease at that point and compile will return NULL
.
References curInd, error, and pattern.
Referenced by parse(), parseBackref(), parseBehind(), parseClass(), parseEscape(), parseOctal(), parsePosix(), parseQuote(), parseRegisteredPattern(), and quantifyCurly().
NFANode * Pattern::registerNode | ( | NFANode * | node | ) | [protected] |
Convenience function for registering a node in nodes
.
node | The node to register |
References nodes.
Referenced by compile(), parse(), parseBackref(), parseBehind(), parseQuote(), quantify(), and quantifyGroup().
bool Pattern::registerPattern | ( | const std::string & | name, | |
const std::string & | pattern, | |||
const unsigned long | mode = 0 | |||
) | [static] |
Registers a pattern under a specific name for use in later compilations. A typical invocation and later use looks like:
Pattern::registerPattern("ip", "(?:\\d{1,3}\\.){3}\\d{1,3}");
Pattern * p1 = Pattern::compile("{ip}:\\d+");
Pattern * p2 = Pattern::compile("Connection from ({ip}) on port \\d+");
Multiple calls to registerPattern
with the same name
will result in the pattern getting overwritten.
name | The name to give to the pattern | |
pattern | The pattern to register | |
mode | Any special flags to use when compiling pattern |
pattern
has invalid syntax References compile(), and registeredPatterns.
std::string Pattern::replace | ( | const std::string & | pattern, | |
const std::string & | str, | |||
const std::string & | replacementText, | |||
const unsigned long | mode = 0 | |||
) | [static] |
Searches through replace
and replaces all substrings matched by pattern
with str
. str
may contain backreferences (e.g. ) to capture groups. A typical invocation looks like:
Pattern::replace("(a+)b(c+)", "abcccbbabcbabc", "\\2b\\1");
which would replace abcccbbabcbabc
with cccbabbcbabcba
.
pattern | The regular expression | |
str | The replacement text | |
replacementText | The string in which to perform replacements | |
mode | The special mode requested of the Pattern during the replacement process |
References compile(), and replace().
Referenced by replace().
std::vector< std::string > Pattern::split | ( | const std::string & | pattern, | |
const std::string & | str, | |||
const bool | keepEmptys = 0 , |
|||
const unsigned long | limit = 0 , |
|||
const unsigned long | mode = 0 | |||
) | [static] |
Splits the specified string over occurrences of the specified pattern. Empty strings can be optionally ignored. The number of strings returned is configurable. A typical invocation looks like:
std::string str(strSize, '');
FILE * fp = fopen(fileName, "r");
fread((char*)str.data(), strSize, 1, fp);
fclose(fp);
std::vector<std::string> lines = Pattern::split("[\r\n]+", str, true);
pattern | The regular expression | |
replace | The string to split | |
keepEmptys | Whether or not to keep empty strings | |
limit | The maximum number of splits to make | |
mode | The special mode requested of the Pattern during the split process |
str
split across pattern
. References compile(), and split().
Referenced by split().
void Pattern::unregisterPatterns | ( | ) | [static] |
Clears the pattern registry
References registeredPatterns.
std::map< std::string, Pattern * > Pattern::compiledPatterns [static, protected] |
This currently is not used, so don't try to do anything with it. Holds all the compiled patterns for quick access.
From the author (Jeff Stuart) " Let me start by saying this file is pretty big. If you feel up to it, you can try making changes yourself, but you would be better off to just email me at stuart@cs.ucdavis.edu if you think there is a bug, or have something useful you would like added. This project is very "near and dear" to me, so I am fairly quick to make bug fixes. The header files for Pattern and Matcher are fairly well documented and the function names are pretty self-explanatory, but if you are having any trouble, feel free to email me at stuart@cs.ucdavis.edu.
If you email me, make sure you put something like C++RE in the subject because I tend to delete email if I don't recognize the name and the subject is something like "I Need Your Help" or "Got A Second" or "I Found It". "
Referenced by clearPatternCache(), and compileAndKeep().
int Pattern::curInd [protected] |
Used during compilation to keep track of the current index into pattern
. Once the pattern is successfully compiled,
error
is no longer used.
Referenced by parse(), parseBackref(), parseBehind(), parseClass(), parseEscape(), parseHex(), parseOctal(), parsePosix(), parseQuote(), parseRegisteredPattern(), quantify(), quantifyCurly(), quantifyGroup(), and raiseError().
bool Pattern::error [protected] |
Flag used during compilation. Once the pattern is successfully compiled, error
is no longer used.
Referenced by parse(), parseRegisteredPattern(), and raiseError().
unsigned long Pattern::flags [protected] |
The flags specified when this was compiled.
Referenced by compile(), getFlags(), parse(), parseQuote(), and parseRegisteredPattern().
int Pattern::groupCount [protected] |
The number of capture groups this contains.
Referenced by parse(), parseBackref(), and parseRegisteredPattern().
NFANode* Pattern::head [protected] |
The front node of the NFA.
Referenced by compile(), Matcher::findFirstMatch(), Matcher::findNextMatch(), and Matcher::matches().
Matcher* Pattern::matcher [protected] |
Used when methods like split are called. The matcher class uses a lot of dynamic memeory, so having an instance increases speedup of certain operations.
Referenced by compile(), findNthMatch(), and ~Pattern().
const unsigned long Pattern::MULTILINE_MATCHING = 0x08 [static] |
^
and $
should anchor to the beginning and ending of lines, not all input
Referenced by parse().
std::map<NFANode*, bool> Pattern::nodes [protected] |
Holds all the NFA nodes used. This makes deletion of a pattern, as well as clean-up from an unsuccessful compile much easier and faster.
Referenced by registerNode(), and ~Pattern().
int Pattern::nonCapGroupCount [protected] |
The number of non-capture groups this contains.
Referenced by parse().
std::string Pattern::pattern [protected] |
The actual regular expression we rerpesent
Referenced by getInt(), getPattern(), parse(), parseBackref(), parseBehind(), parseClass(), parseEscape(), parseHex(), parseOctal(), parsePosix(), parseQuote(), parseRegisteredPattern(), quantify(), quantifyCurly(), quantifyGroup(), and raiseError().
std::map< std::string, std::pair< std::string, unsigned long > > Pattern::registeredPatterns [static, protected] |
Holds all of the registered patterns as strings. Due to certain problems with compilation of patterns, especially with capturing groups, this seemed to be the best way to do it.
Referenced by parseRegisteredPattern(), registerPattern(), and unregisterPatterns().
const unsigned long Pattern::UNIX_LINE_MODE = 0x10 [static] |
When enabled, only instances of
</codes> are recognized as line terminators
Referenced by parse().