Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Fixed a couple markup issues.

...

Regular expressions are the Swiss Army knife of text processing. They provide the programmer the ability to match and extract patterns from strings. The simplest example of a regular expression is a string of letters and numbers. And the simplest expression involving a regular expression uses the ==~ operator. So for example to match Dan Quayle's spelling of 'potato':

Code Block

"potatoe" ==~ /potatoe/

If you put that in the groovyConsole and run it, it will evaluate to true. There are a couple of things to notice. First is the ==
~ operator, which is similar to the == operator, but matches patterns instead of computing exact equality. Second is that the regular expression is enclosed in /'s. This tells groovy (and also anyone else reading your code) that this is a regular expression and not just a string.

But let's say that we also wanted to match the correct spelling, we could add a '?' after the 'e' to say that the e is optional. The following will still evaluate to true.

Code Block

"potatoe" ==~ /potatoe?/

And the correct spelling will also match:

Code Block

"potato" ==~ /potatoe?/

But anything else will not match:

Code Block

"motato" ==~ /potatoe?/

So this is how you define a simple boolean expression involving a regular expression. But let's get a little bit more tricky. Let's define a method that tests a regular expression. So for example, let's write some code to match Pete Wisniewski's last name:

Code Block

def checkSpelling(spellingAttempt, spellingRegularExpression)
{
        if (spellingAttempt ==~ spellingRegularExpression)
        {
               println("Congratulations, you spelled it correctly.")
         } else {
               println("Sorry, try again.")
        }
}

theRegularExpression = /Wisniewski/
checkSpelling("Wisniewski", theRegularExpression)
checkSpelling("Wisnewski", theRegularExpression)

...

Now let's get a little bit more tricky. Let's say we also want to match the string if the name does not have the 'w' in the middle, we might:

Code Block

theRegularExpression = /Wisniew?ski/
checkSpelling("Wisniewski", theRegularExpression)
checkSpelling("Wisnieski", theRegularExpression)
checkSpelling("Wisniewewski", theRegularExpression)

...

Now let's also make it accept the spelling if "ie" in the middle is transposed. Consider the following:

Code Block

theRegularExpression = /Wisn(ie|ei)w?ski/
checkSpelling("Wisniewski", theRegularExpression)
checkSpelling("Wisnieski", theRegularExpression)
checkSpelling("Wisniewewski", theRegularExpression)

...

One last interesting feature is the ability to specify a group of characters all of which are ok. This is done using square brackets *[ ]*. Try the following regular expressions with various misspellings of Pete's last name:

Code Block

theRegularExpression = /Wis[abcd]niewski/ // requires one of 'a', 'b', 'c' or 'd'
theRegularExpression = /Wis[abcd]?niewski/ // will allow one of 'a', 'b', 'c' or 'd', but not required (like above)
theRegularExpression = /Wis[a-zA-Z]niewski/ // requires one of any upper\- or lower-case letter
theRegularExpression = /Wis[^abcd]niewski/ // requires one of any character that is '''not''' 'a', 'b', 'c' or 'd'

...

a?

matches 0 or 1 occurrence of *a*

'a' or empty string

a*

matches 0 or more occurrences of *a*

empty string or 'a', 'aa', 'aaa', etc

a+

matches 1 or more occurrences of *a*

'a', 'aa', 'aaa', etc

a|b

match *a* or *b*

'a' or 'b' -

.

match any single character

'a', 'q', 'l', '_', '+', etc

[woeirjsd]

match any of the named characters

'w', 'o', 'e', 'i', 'r', 'j', 's', 'd'

[1-9]

match any of the characters in the range

'1', '2', '3', '4', '5', '6', '7', '8', '9'

[^13579]

match any characters not named

even digits, or any other character

(ie)

group an expression (for use with other operators)

'ie'

^a

match an *a* at the beginning of a line

'a'

a$

match an *a* at the end of a line

'a'

There are a couple of other things you should know. If you want to use one of the operators above to mean the actual character, like you want to match a question mark, you need to put a '\' in front of it. For example:

Code Block

// evaluates to true, and will for anything ending in a question mark (that doesn't have a question mark in it)
"How tall is Angelina Jolie?" ==~ /[^\?]+\?/

This is your first really ugly regular expression. (The frequent use of these in PERL is one of the reasons it is considered a "write only" language). By the way, google knows how tall [she is|http://www. google.com/search?hl=en&q=how+tall+is+angelina+jolie&btnG=Google+Search]. The only way to understand expressions like this is to pick it apart:

...