The Exorcism of Regular Expressions

0
542

supernatural-legend-exorcism
[dt_gap height=”10″ /]

In all of my years programming I have never run into a demon more powerful and scary than the regular expression.  I still cringe every time I run into one of these beasts in code.  I am not talking about the fluffy regular expressions that you may use to verify that your input is all letters, or numbers. I’m talking about the unwieldy regular expressions that to even the trained eye look like a random sequence of characters. The practice of using giant, undocumented, regular expressions is all too common, so let us explore some techniques to exorcise these demons by documenting them.

Java Example:

Pattern.compile("(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])");

You may have seen some horror movies that suggest the only way to exorcise a demon is to discover it’s name, and that is the first thing we need to do. While regular expressions seem similar between different programming languages and libraries there are actually subtle differences which can make the same regular expression behave differently depending on the library being used and the options which are passed to the regular expression engine.  To discover the name of a demon, you would start with a book of demon names. To discover the named parts used in a regular expression, you need to locate and read the documentation of the regular expression library you are using.

The first step in an exorcism of regular expressions is to document the location of your book of names.  I don’t think I have ever seen it done before, but documenting the location of the regular expression documentation you are using can be very helpful. If you have used the wrong documentation, or the documentation becomes outdated, another developer may realize this and be able to correct bugs by comparing the documentation you used, against the documentation for the system currently being applied.

Example:

/* See: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html */

Pattern.compile("(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])");

Now that wasn’t very hard now was it? You may have just saved future developers from having headaches and sleepless nights. At this point your demon regular expression should be quivering in fear as you have successfully contained it in a trap but it is not yet time to let your guard down.  The documentation may be incomplete, it may contain undocumented features, and it may contain undocumented bugs. Your work has just begun!

(continued on next page…)

1
2
SHARE
Previous articleFreeNode#RubyonRails, an open community, or a fraud?
Next articleAre we in the scientific age of the Ego?
Programming is my passion, and I am constantly working on making my mark in this industry by creating new technologies which open doors to new capabilities. While some of my projects are experimental, most of it is grounded in standard compliant design with a constant focus on security.

LEAVE A REPLY

Please enter your comment!
Please enter your name here