## IDEA Tech Tip: Equation Editor and Regular Expressions

**Q:** I’ve heard that you can use regular expressions in IDEA’s equation editor. Can you show me how to validate a U.S./Canada phone number and an email address in the equation editor?

**A:** That is an excellent question! Being able to do regular expressions in the equation editor is one of IDEA’s hidden gems.

Wikipedia says that a regular expression is “a sequence of characters that define a search pattern. Usually this pattern is then used by string searching algorithms for ‘find’ or ‘find and replace’ operations on strings.” In the IDEA equation editor, the regular expression allows you to extract items such as a telephone number or email address that meet a pattern or extract records that don’t meet the pattern.

The regular expression is considered a character field in the Equation Editor and has the following syntax:

@RegExpr(FIELD NAME, expression)

If we had the following database with a phone number and an email field we would be able to use the regular expression to make sure all the phone numbers and emails are in the proper format.

In this instance I would use the following equation to extract all properly formatted phone numbers:

@RegExpr(PHONE, “((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}”)

If I wanted to extract phone numbers that are not in the proper format—and in this case the format I am testing for is XXX-XXX-XXXX—I would use this equation:

@RegExpr(PHONE, “((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}”) = “”

You can see here I am testing for items that do not have a match. In my example I would have the following transactions.

As you can see, both these phone numbers are missing digits.

If I wanted to extract the email addresses that aren’t in the proper format I would use the following equation:

@RegExpr(EMAIL, “^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$”) = “”

The above equation will return the following two items:

As we can see, one is missing the “@” symbol and the other is missing the “.com” extension.

By looking at these two examples you’ve probably realized that creating these expressions is not the easiest thing to do. I won’t go into specifics as there are many sites you can find just by Googling “regular expression” that explain in detail how these patterns are created, but there is one excellent site that already has these patterns created and many more. Both of the patterns I used come from this site: http://www.regexlib.com/Default.aspx

You can do a keyword search, such as phone or email, and find patterns that have already been created and can be copied and pasted into your equation.

When you’re using IDEA and find that part of your analysis is to validate that a field is in a certain format, you should definitely consider using the @RegExpr function to perform the validation. It’s a very powerful tool.

*Brian Element is a CaseWare IDEA Lead Solution Architect and former financial advisor to Public Services and Procurement at the Government of Canada. He has worked closely with IDEA software for over three decades.*