Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: escape Confluence macro
Wiki Markup
Groovy supports regular expressions natively using the +\~"pattern"+ expression, which creates a compiled Java Pattern object from the given pattern string. Groovy also supports the =\~ (create Matcher) and ==\~ (returns boolean, whether String matches the pattern) operators.

For matchers having groups, matcher\[index\] is either a matched String or a List of matched group Strings.
import java.util.regex.Matcher
import java.util.regex.Pattern

// ~ creates a Pattern from String
def pattern = ~/foo/
assert pattern instanceof Pattern
assert pattern.matcher("foo").matches()    // returns TRUE
assert pattern.matcher("foobar").matches() // returns FALSE, because matches() must match whole String

// =~ creates a Matcher, and in a boolean context, it's "true" if it has at least one match, "false" otherwise.
assert "cheesecheese" =~ "cheese"
assert "cheesecheese" =~ /cheese/
assert "cheese" == /cheese/   /*they are both string syntaxes*/
assert ! ("cheese" =~ /ham/)

// ==~ tests, if String matches the pattern
assert "2009" ==~ /\d+/  // returns TRUE
assert "holla" ==~ /\d+/ // returns FALSE

// lets create a Matcher
def matcher = "cheesecheese" =~ /cheese/
assert matcher instanceof Matcher

// lets do some replacement
def cheese = ("cheesecheese" =~ /cheese/).replaceFirst("nice")
assert cheese == "nicecheese"
assert "color" == "colour".replaceFirst(/ou/, "o")

cheese = ("cheesecheese" =~ /cheese/).replaceAll("nice")
assert cheese == "nicenice"

// simple group demo
// You can also match a pattern that includes groups.  First create a matcher object,
// either using the Java API, or more simply with the =~ operator.  Then, you can index
// the matcher object to find the matches.  matcher[0] returns a List representing the
// first match of the regular expression in the string.  The first element is the string
// that matches the entire regular expression, and the remaining elements are the strings
// that match each group.
// Here's how it works:
def m = "foobarfoo" =~ /o(b.*r)f/
assert m[0] == ["obarf", "bar"]
assert m[0][1] == "bar"

// Although a Matcher isn't a list, it can be indexed like a list.  In Groovy 1.6
// this includes using a collection as an index:

matcher = "eat green cheese" =~ "e+"

assert "ee" == matcher[2]
assert ["ee", "e"] == matcher[2..3]
assert ["e", "ee"] == matcher[0, 2]
assert ["e", "ee", "ee"] == matcher[0, 1..2]

matcher = "cheese please" =~ /([^e]+)e+/
assert ["se", "s"] == matcher[1]
assert [["se", "s"], [" ple", " pl"]] == matcher[1, 2]
assert [["se", "s"], [" ple", " pl"]] == matcher[1 .. 2]
assert [["chee", "ch"], [" ple", " pl"], ["ase", "as"]] == matcher[0, 2..3]
// Matcher defines an iterator() method, so it can be used, for example,
// with collect() and each():
matcher = "cheese please" =~ /([^e]+)e+/
matcher.each { println it }
assert matcher.collect { it }  ==
                  [["chee", "ch"], ["se", "s"], [" ple", " pl"], ["ase", "as"]]
// The semantics of the iterator were changed by Groovy 1.6.
// In 1.5, each iteration would always return a string of the entire match, ignoring groups.
// In 1.6, if the regex has any groups, it returns a list of Strings as shown above.

// there is also regular expression aware iterator grep()
assert ["foo", "moo"] == ["foo", "bar", "moo"].grep(~/.*oo$/)
// which can be written also with findAll() method
assert ["foo", "moo"] == ["foo", "bar", "moo"].findAll { it ==~ /.*oo/ }
Since a Matcher coerces to a boolean by calling its *find* method, the =\~ operator is consistent with the simple use of Perl's =\~ operator, when it appears as a predicate (in 'if', 'while', etc.).  The "stricter-looking" ==\~ operator requires an exact match of the whole subject string. It returns a Boolean, not a Matcher.

Regular expression support is imported from Java.  Java's regular expression language and API is documented [here in the Pattern JavaDocs|].

h2. More Examples

*Goal:* Capitalize words at the beginning of each line:
def before='''

def expected='''

assert expected == before.replaceAll(/(?m)^\w+/,
    { it[0].toUpperCase() + ((it.size() > 1) ? it[1..-1] : '') })
*Goal:* Capitalize every word in a string:
assert "It Is A Beautiful Day!" ==
    ("it is a beautiful day!".replaceAll(/\w+/,
        { it[0].toUpperCase() + ((it.size() > 1) ? it[1..-1] : '') }))
Add .toLowerCase() to make the rest of the words lowercase
assert "It Is A Very Beautiful Day!" ==
    ("it is a VERY beautiful day!".replaceAll(/\w+/,
        { it[0].toUpperCase() + ((it.size() > 1) ? it[1..-1].toLowerCase() : '') }))

h2. Gotchas

How to use backreferences with String.replaceAll()

GStrings do *not* work as you'd expect:

def replaced = "abc".replaceAll(/(a)(b)(c)/, "$1$3")

Produces an error like the following:

\[\] illegal string body character after dollar sign:
solution: either escape a literal dollar sign "\$5" or bracket the value expression "$\{5}" @ line \[\]


Use ' or / to delimit the replacement string:

def replaced = "abc".replaceAll(/(a)(b)(c)/, '$1$3')