Thursday, January 27, 2011

RegEx and Loops Containing a Certain Word

I found myself in a situation where I had to search loops in VB.NET that contained a collection modification operation, like .Add( ) or .Delete( ). I thought this would be an ideal job for regular expressions, so off I went.

After reading and struggling for a day, this is what I finally came up with, thanks to the help of some co-workers:

(?si)(for\ each\ |while|do\ )(.(?!(next|end while|loop)))+(delete|add|remove|modify|append)+(.(?!(next|end while|loop)))+

So, taking it section by section...

(?si) : This is the options block. We have turned on Case Insensitivity and Single Line, which says the dot operator will match all characters, including CR/LF.

(for\ each\ |while|do\ ) : This block searches for a For Each / While / Do loop keyword.

(.(?!(next|end while|loop)))+ : This block searches for any characters excluding Next / End While / Loop ending keywords. This prevents us from running from one loop to the next. There are ambiguity issues with improper matching a loop opening keyword with the wrong loop ending keyword, but this was close enough for my purposes.

(delete|add|remove|modify|append)+ : This block searches for any of the words Delete / Add / Remove / Modify / Append, one or more occurrences.

(.(?!(next|end while|loop)))+ : This block searches for any characters that are not the words Next / End While / Loop. Again, not perfect, but close enough for my purposes.

So that's it, that's how I ran it. I did have some false positive matches, but searching this way was way easier than manually searching through the code base. Plus I learned more about regular expressions, which will definitely come in handy later. The entire operation was done in Expresso, very handy for validating what you think you are doing versus what you are *really* doing, and viewing the results of your regex.