Ungreedy Regular Expressions in Ruby
I was recently working on a script to condense or pretty-print CSS. Condensing is actually pretty easy, but pretty-printing involved preserving comments while sorting style directives within rules. (For those who aren't familiar with CSS, its comments are delimited by /* and */ just like in C.) Matching comments, particularly multiline comments, is pretty easy as long as you can make your regular expressions ungreedy. The naïve, greedy regex /\/\*.*\*\//m (note the m option at the end, which sets the multiline option for the Regexp) will not stop at just one comment but will match everything from the beginning of the first comment to the end of the last comment, including all the uncommented code in between. This is clearly wrong, and the problem is that * (and +) is greedy (i.e. matches as much text as it can).
If greedy matching is the problem, how do we make it ungreedy? It turns out that Ruby takes a page from Perl regular expressions and whereas * (and +) is the greedy version, *? (and +?) is the ungreedy version. Thus our problem regex becomes /\/\*.*?\*\//m and works as desired.
This may not be quite as significant as previous posts, but it's really handy to know when you need it.
7 Comments:
At 6/11/2006 05:02:00 PM, Anonymous said…
The ? means "ungreedy." It's not just + and * that it modifies. For example:
/[a]{2,}?/
will match only the minimum number of 'a'.
irb(main):026:0> "aaaaab" =~ /([a]{2,}?)/
=> 0
irb(main):027:0> $1
=> "aa"
At 6/11/2006 05:34:00 PM, Gregory said…
Good to know. Thanks!
At 7/09/2006 04:24:00 AM, Anonymous said…
This was exactly what I was looking for; thanks for posting!
At 3/07/2007 11:49:00 AM, Evgeniy K said…
thanks!
At 10/03/2007 12:29:00 AM, Jeremy Nicoll said…
You can use the anti-character match as well. If you want to match anything up to the next space you can do /[^\s]+/ - I use this one constantly.
At 8/23/2010 03:32:00 AM, john said…
Thanks! You helped me solve a frustrating problem.
At 6/02/2011 10:54:00 PM, Michael Ebens said…
Thanks so much! Fixed a frustrating problem I've been spending a while on.
Post a Comment
<< Home