I'll try to make a more interesting pazzle. This time, we test whether or not a string matches another string, say pattern.
In order to be useful, we import some characters with special meaning into patterns. The following are special characters.
[ ] range specification. (e.g., [a-z] means a letter in range of from a to z) \w letter or digit. same as [0-9A-Za-z_] \W neither letter nor digit \s blank character. same as [ \t\n\r\f] \S non-space character. \d digit character. same as [0-9]. \D non digit character. \b word boundary (outside of range specification). \B non word boundary. \b back spage (0x08) (inside of range specification) * zero or more times repetition of followed expression + zero or one times repetition of followed expression {m,n} at least n times, but not more than m timesrepetition of followed expression ? at least 0 times, but not more than 1 timesrepetition of followed expression | eather followed or leaded expression ( ) grouping
For example, `^f[a-z]+' means "repetition of letters in range from `a' to `z' which is leaded by `f'" Special matching characters like these are called `reguler expression'. Regular expressions are useful for string finding, so it is used very often in UNIX environment. A typical example is `grep'.
To understand regular expressions, let's make a little
program. Store the following program into a file named
`regx.rb' and then execute it.
Note: This program works only on UNIX because this uses
reverse video escape sequences.
st = "\033[7m" en = "\033[m" while TRUE print "str> " STDOUT.flush str = gets break if not str str.chop! print "pat> " STDOUT.flush re = gets break if not re re.chop! str.gsub! re, "#{st}\\&#{en}" print str, "\n" end print "\n"
This program requires input twice and reports matching in first input string to second input regular expression by reverse video displaying. Don't mind details now, they will be explained.
str> foobar pat> ^fo+ foobar ~~~
# foo is reversed and ``~~~'' is just for text-base brousers.
Let's try several inputs.
str> abc012dbcd555 pat> \d abc012dbcd555 ~~~ ~~~
This program detect multiple muchings.
str> foozboozer pat> f.*z foozboozer ~~~~~~~~
`fooz' isn't matched but foozbooz is, since a regular expression maches the longest substring.
This is too diffucult of a pattern to recognize at a glance.
str> Wed Feb 7 08:58:04 JST 1996 pat> [0-9]+:[0-9]+(:[0-9]+)? Wed Feb 7 08:58:04 JST 1996 ~~~~~~~~
In ruby, a regular expression is quoted by `/'. Also, some methods convert a string into a regular expression automatically.
ruby> "abcdef" =~ /d/ 3 ruby> "abcdef" =~ "d" 3 ruby> "aaaaaa" =~ /d/ FALSE ruby> "aaaaaa" =~ "d" FALSE
`=~' is a matching operator with respected to regular expression; it returns the position when matched.