Skip to content
Skip to breadcrumbs
Skip to header menu
Skip to action menu
Skip to quick search
Quick Search
Browse
Pages
Blog
Labels
Attachments
Mail
Advanced
What’s New
Space Directory
Feed Builder
Keyboard Shortcuts
Confluence Gadgets
Log In
Sign Up
Dashboard
Groovy
Copy Page
You are not logged in. Any changes you make will be marked as
anonymous
. You may want to
Log In
if you already have an account. You can also
Sign Up
for a new account.
This page is being edited by
.
Paragraph
Paragraph
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Preformatted
Quote
Bold
Italic
Underline
More colours
Strikethrough
Subscript
Superscript
Monospace
Clear Formatting
Bullet list
Numbered list
Outdent
Indent
Align left
Align center
Align right
Link
Table
Insert
Insert Content
Image
Link
Attachment
Symbol
Emoticon
Wiki Markup
Horizontal rule
tinymce.confluence.insert_menu.macro_desc
Info
JIRA Issue
Status
Gallery
Tasklist
Table of Contents
Other Macros
Page Layout
No Layout
Two column (simple)
Two column (simple, left sidebar)
Two column (simple, right sidebar)
Three column (simple)
Two column
Two column (left sidebar)
Two column (right sidebar)
Three column
Three column (left and right sidebars)
Undo
Redo
Find/Replace
Keyboard Shortcuts Help
<p>Here are some suggestions for using regular expressions in Groovy. This page mainly focuses on documenting regular expressions in Groovy, however, those suggestions are applicable to any programming language that supports regular expressions such as Perl and Java.</p> <p><em>Note:</em> I couldn't get boldface and colors to work in code-listings under this wiki. This page was originally developed in the MediaWiki at <a href="https://www.ngdc.noaa.gov/wiki/index.php?title=Regular_Expressions_in_Groovy">https://www.ngdc.noaa.gov/wiki/index.php?title=Regular_Expressions_in_Groovy</a> and the formatting looks better there.</p> <h2>Reference Links</h2> <p>Here are some useful reference links that you may want to open up along side this page:</p> <ul> <li><a href="http://groovy.codehaus.org/Regular+Expressions">Groovy Regular Expressions</a></li> <li><a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html">java.util.regex.PatternAPI</a></li> <li><a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Matcher.html">java.util.regex.MatcherAPI</a></li> <li><a href="http://pleac.sourceforge.net/pleac_groovy/patternmatching.html">PLEAC Pattern Matching</a>, PLEAC is Programming Language Examples Alike Cookbook and serves many programming languages</li> </ul> <h2>Documenting the RegEx</h2> <p>It's important to document any regular expression, or "regex" for short, that is more than a trivial match. Documenting regexes is the key to making them understandable so they can be debugged and modified either by someone else or by you after you've had time to forget the details.</p> <h3>Overview</h3> <ul> <li><strong>Include a sample of text to match</strong> <ul> <li>give a plain English description of your goal</li> <li>omit excess lines of sample if long</li> </ul> </li> <li><strong>Use extended patterns with comments</strong> <ul> <li>mark capturing groups by number</li> <li>include "landmark" keys in the pattern</li> </ul> </li> <li><strong>Include debugging feedback</strong> <ul> <li>use debugging lines</li> </ul> </li> </ul> <h3>Include a sample of text to match</h3> <p>Having a sample of the input that the regular expression is being applied to look at right on screen is a great help in deciphering what the pattern is trying to match. This is best done in a block comment before the regular expression pattern is defined.</p> <ul> <li>give a plain English description of your goal <ul> <li>note "landmark" keys in the pattern that you rely on to reliably parse the data</li> <li>list any sub-parts (captured groups) of the pattern you wish to use after the match</li> </ul> </li> <li>omit excess lines of sample if long</li> </ul> <p>For example, on a system that has remotely mounted disk space with names like "/nfs/data" or "/nfs/DATA" we wish to gather the space free in kilobytes and the name on which the space is mounted. The output from the "df -k" (disk free space in kilobytes, on linux/mac/unix systems) could be parsed by this pattern:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>pattern = ~/(?i)(\d+)\s+\d+%\s+(\/nfs\/data.*)/ </pre></td></tr></table> <p>The "(?i)" is a match flag that means the pattern is case insensitive. The digits followed immediately by "%" is the key "landmark" in this regex, in all the output of the df this sequence is always preceded by the available space, regardless of whether the entry is remote and split across two lines or local and contained in one line.</p> <p>To summarize the parts (regular expression constructs) here:</p> <ul> <li><strong>(\d+)</strong> - One or more digits, captured for later use, "+" means 1 or more repetitions, see the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html">Pattern API</a></li> <li><strong>\s+</strong> - One or more whitespace characters</li> <li><strong>\d+%</strong> - One or more digits followed by "%", the percentage of disk used</li> <li><strong>(\/nfs\/data.*</strong>) - look for a partition name that starts out with "/nfs/data" <ul> <li><strong>\/</strong> - A literal "/", escaped by "\" since a slash by itself starts or ends the pattern</li> <li><strong>.*</strong> - Matches 0 or more characters, "." is a wildcard, "*" means 0 or more repetitions</li> </ul> </li> </ul> <p>Following the suggestions above, a header comment is added:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>/* The input string to parse is from the output of the 'df -k' command. Example input: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 4185956 1206996 2762888 31% / /dev/sda11 30679784 28324040 772140 98% /extra <..lines omitted..> fas3050c-1-2.b:/vol/canis 10485760 6599936 3885824 63% /nfs/data_d2/dog_data fas6070-1-1.b:/vol/felis 314572800 54889600 259683200 18% /nfs/DATA-1/cat_data We want the available disk space in KB, for /nfs/data remote disk partitions, which is "3885824" and "259683200" in the sample above. Note that partitions that start out with "/nfs/data" may be either upper or lower case. Capture the partition name for debugging. Capture the space available in KB as the number before the percentage number (digits followed by '%'). The pattern match is relying on there being just one instance of a numeric percentage (\d+%) occurring in the each usable output line. The header line contains a "%", but the characters preceding it are non-numeric, "Use%", so it is ignored. Also, many of the lines we want are split across two lines, but it is only the second line that contains the information we want. The "(?i)" match flag indicates the pattern is case insensitive. */ pattern = ~/(?i)(\d+)\s+\d+%\s+(\/nfs\/data.*)/ </pre></td></tr></table> <h3>Use extended patterns with comments</h3> <p>The <strong>extended match mode</strong> is enabled by a pattern match flag which allows white space and comments to be embedded into the pattern. You can then describe, piece by piece, the parts of the regular expression without dumping those details into the already large header comment suggested above. <a class="confluence-link" href="#Pattern_Match_Flags" data-anchor="Pattern_Match_Flags" data-linked-resource-default-alias="Pattern_Match_Flags" data-base-url="http://docs.codehaus.org">Pattern match flags</a> are discussed in more detail later in this document.</p> <p>In Groovy, this match flag is "<strong>(?x)</strong>" and can be combined with other flags you wish to turn on such as "(?ix)" for both extended and case-insensitive modes. This is done in conjunction with Groovy "here" documents (triple quoting), which is handled somewhat differently than the "slashy" quoting used for regular expression patterns. The three examples below are equivalent, but I've highlighted in <span style="color: red;">red</span> what is removed from the first, and colored <span style="color: green;">green</span> the new text in the second and third examples. <em>(Note: the original author didn't figure out how to get the red and green color to work in the code listings for this wiki. If anyone knows how to fix this, please do.)</em></p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>// 1. slashy regex pattern = ~/(?i)(\d+)\s+\d+%\s+(\/nfs\/data.*)/ </pre></td></tr></table> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>// 2. string converted to a regex regex = "(?i)(\\d+)\\s+\\d+%\\s+(/nfs/data.*)" pattern = ~regex </pre></td></tr></table> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>// 3. here document string converted to a regex regex = '''(?ix)(\\d+)\\s+\\d+%\\s+(/nfs/data.*)''' pattern = ~regex </pre></td></tr></table> <p>Essentially, when converting a slashy regex to a string based pattern:</p> <ul> <li>Forward slashes don't need to be escaped by back slashes so "\/" becomes "/"</li> <li>Double the remaining back slashes. Back slashes need to be escaped by back slashes when quoting strings (either normal or here documents)</li> <li>If you want to match whitespace, then you must use <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>\\s</pre></td></tr></table></li> <li>You can match "#" with <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>\\#</pre></td></tr></table> so that it's not interpreted as the beginning of a comment</li> </ul> <p>What does the third example buy you? Now newlines and comments can be included. The third example (here document) above can also be written:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>// 3. here document string converted to a regex regex = '''(?ix) # comments are now allowed! (\\d+) # disk space \\s+ \\d+% # one or more numbers followed by "%" \\s+ (/nfs/data.*) # partition name''' pattern = ~regex </pre></td></tr></table> <p>This allows you to</p> <ul> <li><strong>mark capturing groups by number</strong> <ul> <li>I mark these with a numbered comment like "<code># 1: The disk space we want</code>"</li> </ul> </li> <li><strong>explain "landmark" keys in the pattern</strong> <ul> <li>For example <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>\\d% # a number followed by %</pre></td></tr></table> Not every line needs a comment, but don't leave out any important key matches.</li> </ul> </li> </ul> <p>Expanding on the example above:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>/* The input string to parse is from the output of the 'df -k' command. Example input: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 4185956 1206996 2762888 31% / /dev/sda11 30679784 28324040 772140 98% /extra <..lines omitted..> fas3050c-1-2.b:/vol/canis 10485760 6599936 3885824 63% /nfs/data_d2/dog_data fas6070-1-1.b:/vol/felis 314572800 54889600 259683200 18% /nfs/DATA-1/cat_data We want the available disk space in KB, for /nfs/data remote disk partitions, which is "3885824" and "259683200" in the sample above. Note that partitions that start out with "/nfs/data" may be either upper or lower case. (#2:) Capture the partition name for debugging. (#1:) Capture the space available in KB as the number before the percentage number (digits followed by '%'). The pattern match is relying on there being just one instance of a numeric percentage (\d+%) occuring in the each usable output line. The header line contains a "%", but the characters preceding it are non-numeric, "Use%", so it is ignored. Also, many of the lines we want are split across two lines, but it is only the second line that contains the information we want. The "(?i)" match flag indicates the pattern is case insensitive. The extended mode (?x) allows whitespace and comments starting with "#" to be embedded in the regular expression. */ regex = '''(?ix) # case insensitive, extended format (\\d+) # 1: The disk space we want \\s+ # some whitespace \\d+% # a number followed by % \\s+ # some more whitespace (/nfs/data.*) # 2: partition name''' pattern = ~regex </pre></td></tr></table> <p>If that's not any easier to understand than what we started out with,</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>pattern = ~/(?i)(\d+)\s+\d+%\s+(\/nfs\/data.*)/ </pre></td></tr></table> <p>I'll just assume you're the sort of person who never reads code comments.</p> <h3>Include debugging feedback</h3> <ul> <li>use debugging lines <ul> <li>they are easy to turn on/off with a flag variable</li> <li>they verify the regular expression is working</li> </ul> </li> </ul> <p>While developing regular expressions, you will probably want to be able to easily test the result. An easy option is to add debugging lines The debugging lines can be controlled by a boolean flag to turn them on or off. For little development programs and snippets in the groovyConsole, this is easier than setting up logging. The example above can be expanded with a 'debugging' flag and debugging lines like this:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>/* The input string to parse is from... <...the rest of the header comment from above...> */ boolean debugging = true if (debugging) { // Test data dfkOutput = ''' Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 4185956 1206996 2762888 31% / /dev/sda11 30679784 28324040 772140 98% /extra fas3050c-1-2.b:/vol/canis 10485760 6599936 3885824 63% /nfs/data_d2/dog_data fas6070-1-1.b:/vol/felis 314572800 54889600 259683200 18% /nfs/DATA-1/cat_data ''' } else { // Real data dfkOutput = 'df -k'.execute().text } long kbAvail = 0 regex = '''(?ix) # enable case-insensitive matches, extended patterns (\\d+) # 1: The disk space we want \\s+ # some whitespace \\d+% # a number followed by % \\s+ # some more whitespace (/nfs/data.*) # 2: partition name''' pattern = ~regex matcher = pattern.matcher(dfkOutput) if (debugging) { println """matcher pattern: /---------------------------------\\ ${matcher.pattern()} \\---------------------------------/""" println "match count=${matcher.getCount()}" } for (i=0; i < matcher.getCount(); i++) { if (debugging) { println " text matched in matcher[${i}]: '" + matcher[i][0] + "'" println " free space in (group 1): '" + matcher[i][1] + "'" println " partition name (group 2): '" + matcher[i][2] + "'" } kbAvail += matcher[i][1].toLong() } println "KB available=${kbAvail}" </pre></td></tr></table> <p>With 'debugging = true', this prints some information to show the regular expression is working on the test data:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>matcher pattern: /---------------------------------\ (?ix) # enable case-insensitive matches, extended patterns (\d+) # 1: The disk space we want \s+ # some whitespace \d+% # a number followed by % \s+ # some more whitespace (/nfs/data.*) # 2: partition name \---------------------------------/ match count=2 text matched in matcher[0]: '3885824 63% /nfs/data_d2/dog_data' free space in (group 1): '3885824' partition name (group 2): '/nfs/data_d2/dog_data' text matched in matcher[1]: '259683200 18% /nfs/DATA-1/cat_data' free space in (group 1): '259683200' partition name (group 2): '/nfs/DATA-1/cat_data' KB available=263569024 </pre></td></tr></table> <p>You can see by the output above that most of the input is ignored because it doesn't meet the described pattern. For those entries that are split across two lines, it turns out that all the information we want is in the second line, which still meets the pattern criteria, and the first line is ignored for not matching.</p> <p>And if you set 'debugging = false', only the result is printed:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>KB available=263569024 </pre></td></tr></table> <p><img class="editor-inline-macro" src="/plugins/servlet/confluence/placeholder/macro?definition=e2FuY2hvcjpQYXR0ZXJuX01hdGNoX0ZsYWdzfQ&locale=en_GB&version=2" data-macro-name="anchor" data-macro-default-parameter="Pattern_Match_Flags"></p> <h2>Pattern Match Flags</h2> <p>The Java regular expression support includes many options modeled after Perl, which is one of the strongest regular expression parsing languages. Since Groovy gets its regular expression capability from Java (which copied Perl), what works in Java applies equally well to Groovy. Looking at the Java <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html">java.util.regex.PatternAPI</a> we see that there is support for pattern match flags under the section called "Special constructs (non-capturing)." Specifically the line indicating</p> <blockquote><p>(?idmsux-idmsux) Nothing, but turns match flags on - off</p></blockquote> <p>These capture nothing, but activate the match flags "idmsux". These correspond mostly to similarly named flags in Perl:</p> <table class="wysiwyg-macro" data-macro-name="unmigrated-inline-wiki-markup" data-macro-parameters="atlassian-macro-output-type=BLOCK" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e3VubWlncmF0ZWQtaW5saW5lLXdpa2ktbWFya3VwOmF0bGFzc2lhbi1tYWNyby1vdXRwdXQtdHlwZT1CTE9DS30&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>{center}Match Flags{center}</pre></td></tr></table> <table class="confluenceTable"><tbody> <tr> <th class="confluenceTh"><p> Flag </p></th> <th class="confluenceTh"><p> Java/Groovy </p></th> <th class="confluenceTh"><p> Perl </p></th> <th class="confluenceTh"><p> Description </p></th> </tr> <tr> <th class="confluenceTh"><p> i </p></th> <td class="confluenceTd"><p> CASE_INSENSITIVE </p></td> <td class="confluenceTd"><p> ignore case </p></td> <td class="confluenceTd"><p> Do case insensitive pattern matching </p></td> </tr> <tr> <th class="confluenceTh"><p> d </p></th> <td class="confluenceTd"><p> UNIX_LINES </p></td> <td class="confluenceTd"><p> <em>not in Perl</em> </p></td> <td class="confluenceTd"><p> Enables Unix lines mode, only '\n' line terminator affects <strong>.</strong>, <strong>^</strong> and <strong>$</strong> </p></td> </tr> <tr> <th class="confluenceTh"><p> m </p></th> <td class="confluenceTd"><p> MULTILINE </p></td> <td class="confluenceTd"><p> multiline </p></td> <td class="confluenceTd"><p> Enables multiline mode. In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence. </p></td> </tr> <tr> <th class="confluenceTh"><p> s </p></th> <td class="confluenceTd"><p> DOTALL </p></td> <td class="confluenceTd"><p> single line </p></td> <td class="confluenceTd"><p> In Perl this is called Single-line mode, treating the input as as single line even if it includes line terminators. Normally the "<strong>.</strong>" wildcard doesn't match line terminators, but in Dotall mode it matches all characters. </p></td> </tr> <tr> <th class="confluenceTh"><p> u </p></th> <td class="confluenceTd"><p> UNICODE_CASE </p></td> <td class="confluenceTd"><p> <em>not in Perl</em> </p></td> <td class="confluenceTd"><p> Enables Unicode-aware case folding. </p></td> </tr> <tr> <th class="confluenceTh"><p> x </p></th> <td class="confluenceTd"><p> COMMENTS </p></td> <td class="confluenceTd"><p> eXtended </p></td> <td class="confluenceTd"><p> Extended mode allows whitespace, including newlines, and comments beginning with "#" and ending with a newline. Since whitespace in the pattern is ignored, use '\s' to indicate whitespace you wish to match. </p></td> </tr> <tr> <th class="confluenceTh"><p> g </p></th> <td class="confluenceTd"><p> <em>automatic in Java/Groovy</em> </p></td> <td class="confluenceTd"><p> global </p></td> <td class="confluenceTd"><p> Global matching keeps track of a current position in the input, so you can step through each place the pattern matches the input. Groovy sets up a result array that is an array of arrays of strings. The first dimension represents the number of matches, the second dimension contains the substrings that represent actual text matched and any captured groups. </p></td> </tr> </tbody></table> <h4>Example usage of Match Flags</h4> <p>This is how the Match Flags may be used in <strong>Groovy</strong>. Below is a pattern that will grab the disk space value from the output of the unix "df" (disk freespace) command, but only on lines containing "/nfs/data", the "(?i)" indicates a case insensitive match:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>pattern = \~/(?i)(\d+)\s+\d+%\s+(\/nfs\/data.*)/ </pre></td></tr></table> <p>Note that the "(?i)" match flag comes at the beginning and is wrapped in parentheses. The "?" indicates this is a non-capturing group. Normally anything in a regular expression you wish to capture is wrapped in parentheses. Assuming the match over all succeeded, this "captured" group is available for later use as a substring that matched the portion of the regular expression in the parentheses.</p> <p>For comparison, here is the equivalent pattern in <strong>Perl</strong>:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>$dfkOutput =\~ /(\d+)\s+\d+%\s+(\/nfs\/data.*)/i </pre></td></tr></table> <p>Notice that the "i" indicating a case-insensitive match is appended at the end of the pattern.</p> <h3>The eXtended Pattern Match Flag (x)</h3> <p>The key flag to making regular expressions more readable is the "<strong>x</strong>" flag, which meant "eXtended" in Perl, and in Java/Groovy refers to "Comments," in a less mnemonic way.</p> <p>Although Java and Groovy use the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html">java.util.regex.PatternAPI</a>, which was modeled after Perl regular expressions, the extended mode wasn't of much use in Java. You <em>could</em> write</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>// Java regex with extended format String patternStr = "(?ix) # case insensitive, extended format\n" + "(\\d+) # 1: The disk space we want\n" + "\\s+ # some whitespace\n" + "\\d+% # a number followed by %\n" + "\\s+ # some more whitespace\n" + "(/nfs/data.*) # 2: partition name"; Pattern pattern = Pattern.compile(patternStr); Matcher m = pattern.matcher(dfOutput); </pre></td></tr></table> <p>But it would be a little less typing to simulate this with ordinary comments:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>// Java regex with comments simulating extended format String patternStr = "(?i)" + // case insensitive "(\\d+)" + // 1: The disk space we want "\\s+" + // some whitespace "\\d+%" + // a number followed by % "\\s+" + // some more whitespace "(/nfs/data.*)"; // 2: partition name Pattern pattern = Pattern.compile(patternStr); Matcher m = pattern.matcher(dfOutput); </pre></td></tr></table> <p>However, in Groovy a regex combined with a here document, this becomes much cleaner (and more Perl-like):</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre>// Groovy regex regex = '''(?ix) # enable case-insensitive matches, extended patterns (\\d+) # 1: The disk space we want \\s+ # some whitespace \\d+% # a number followed by % \\s+ # some more whitespace (/nfs/data.*) # 2: partition name''' pattern = ~regex matcher = pattern.matcher(dfOutput) </pre></td></tr></table> <p>For comparison, here's the Perl equivalent:</p> <table class="wysiwyg-macro" data-macro-name="code" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e2NvZGV9&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre># Perl regex $dfOutput =~ m! # begin match (\d+) # 1: Disk space we're after \s+ # one+ whitespace characters \d+% # digits followed by '%' \s+ # one+ whitespace characters (/nfs/data.*) # 2: partition name !ix; # end match with case-(i)nsensitive # and e(x)tended format options </pre></td></tr></table> <p>In Perl you can use "m<em><delim></em>" to start a match instead of "/", so I could pick "!" as the delimiter and avoid having to escape the "/" characters in the partition name. Also Perl regular expressions use a single backslash, as in Groovy single line slashy regex patterns. Unfortunately, <strong>for multi-line Groovy patterns defined as a here document string, you need to double the backslashes</strong>.</p> <h2>Other Notes</h2> <h3>Test code fragments in the groovyConsole</h3> <p>Groovy takes a couple of seconds to compile and run, and during iterative testing of a regular expression, this time becomes noticeable. Paste your code fragment into groovyConsole, set up some test input and a print statement to show the result. You can then run rapidly there without the compile lag. The print statement can become your debugging line when copying code back to your main project.</p> <h3>Conquer Complex Patterns by Dividing them into Simpler Categories</h3> <p>Beware of trying to deal with complicated patterns with a "one-size-fits-all" monster regular expression. Often it's easier to use an initial pattern to decide the category of input, and then select (if-then-else or switch-case) an appropriate simpler regex that deals with just that sub-category of input. You may need to include messages to warn against unexpected input that doesn't match any of the known categories.</p>
Please type the word appearing in the picture.
Attachments
Labels
Location
Watch this page
< Edit
Preview >
Loading…
Save
Cancel
Next hint
search
attachments
weblink
advanced