...
Regular expressions are the Swiss Army knife of text processing. They provide the programmer the ability to match and extract patterns from strings. The simplest example of a regular expression is a string of letters and numbers. And the simplest expression involving a regular expression uses the ==~ operator. So for example to match Dan Quayle's spelling of 'potato':
| Code Block |
|---|
"potatoe" ==~ /potatoe/
|
If you put that in the groovyConsole and run it, it will evaluate to true. There are a couple of things to notice. First is the ==
~ operator, which is similar to the == operator, but matches patterns instead of computing exact equality. Second is that the regular expression is enclosed in /'s. This tells groovy (and also anyone else reading your code) that this is a regular expression and not just a string.
But let's say that we also wanted to match the correct spelling, we could add a '?' after the 'e' to say that the e is optional. The following will still evaluate to true.
| Code Block |
|---|
"potatoe" ==~ /potatoe?/
|
And the correct spelling will also match:
| Code Block |
|---|
"potato" ==~ /potatoe?/
|
But anything else will not match:
| Code Block |
|---|
"motato" ==~ /potatoe?/
|
So this is how you define a simple boolean expression involving a regular expression. But let's get a little bit more tricky. Let's define a method that tests a regular expression. So for example, let's write some code to match Pete Wisniewski's last name:
| Code Block |
|---|
def checkSpelling(spellingAttempt, spellingRegularExpression)
{
if (spellingAttempt ==~ spellingRegularExpression)
{
println("Congratulations, you spelled it correctly.")
} else {
println("Sorry, try again.")
}
}
theRegularExpression = /Wisniewski/
checkSpelling("Wisniewski", theRegularExpression)
checkSpelling("Wisnewski", theRegularExpression)
|
...
Now let's get a little bit more tricky. Let's say we also want to match the string if the name does not have the 'w' in the middle, we might:
| Code Block |
|---|
theRegularExpression = /Wisniew?ski/
checkSpelling("Wisniewski", theRegularExpression)
checkSpelling("Wisnieski", theRegularExpression)
checkSpelling("Wisniewewski", theRegularExpression)
|
...
Now let's also make it accept the spelling if "ie" in the middle is transposed. Consider the following:
| Code Block |
|---|
theRegularExpression = /Wisn(ie|ei)w?ski/
checkSpelling("Wisniewski", theRegularExpression)
checkSpelling("Wisnieski", theRegularExpression)
checkSpelling("Wisniewewski", theRegularExpression)
|
...
One last interesting feature is the ability to specify a group of characters all of which are ok. This is done using square brackets *[ ]*. Try the following regular expressions with various misspellings of Pete's last name:
| Code Block |
|---|
theRegularExpression = /Wis[abcd]niewski/ // requires one of 'a', 'b', 'c' or 'd'
theRegularExpression = /Wis[abcd]?niewski/ // will allow one of 'a', 'b', 'c' or 'd', but not required (like above)
theRegularExpression = /Wis[a-zA-Z]niewski/ // requires one of any upper\- or lower-case letter
theRegularExpression = /Wis[^abcd]niewski/ // requires one of any character that is '''not''' 'a', 'b', 'c' or 'd'
|
...
a? | matches 0 or 1 occurrence of *a* | 'a' or empty string |
|---|---|---|
a* | matches 0 or more occurrences of *a* | empty string or 'a', 'aa', 'aaa', etc |
a+ | matches 1 or more occurrences of *a* | 'a', 'aa', 'aaa', etc |
a|b | match *a* or *b* | 'a' or 'b' - |
. | match any single character | 'a', 'q', 'l', '_', '+', etc |
[woeirjsd] | match any of the named characters | 'w', 'o', 'e', 'i', 'r', 'j', 's', 'd' |
[1-9] | match any of the characters in the range | '1', '2', '3', '4', '5', '6', '7', '8', '9' |
[^13579] | match any characters not named | even digits, or any other character |
(ie) | group an expression (for use with other operators) | 'ie' |
^a | match an *a* at the beginning of a line | 'a' |
a$ | match an *a* at the end of a line | 'a' |
There are a couple of other things you should know. If you want to use one of the operators above to mean the actual character, like you want to match a question mark, you need to put a '\' in front of it. For example:
| Code Block |
|---|
// evaluates to true, and will for anything ending in a question mark (that doesn't have a question mark in it)
"How tall is Angelina Jolie?" ==~ /[^\?]+\?/
|
This is your first really ugly regular expression. (The frequent use of these in PERL is one of the reasons it is considered a "write only" language). By the way, google knows how tall [she is|http://www. google.com/search?hl=en&q=how+tall+is+angelina+jolie&btnG=Google+Search]. The only way to understand expressions like this is to pick it apart:
...