Validate Google Form Data with Regex Statements

Click image to enlarge

In this post I gave an overview of how to validate user input on Google Forms. As I said there, you can often avoid the use of regular expressions by using other options such as Contains or Does Not Contain.

Sometimes, though, a regex (or regular expression, to give it its full title) is the only option. Learning how to write a regex can be difficult and time consuming, so here I hope to give some background about what the various parts of a regex statement each signify, and offer a few examples you can use as they are or modify for your own use. (Please note that I don’t guarantee these to be bulletproof; you may find a few valid data examples that fail the test, or a few erroneous one that pass).

.       Matches any single character.
[ ]     Matches a single character contained within the square brackets. 
[^ ]    Matches a single character not contained within the square brackets.
^       Matches the beginning of the string. Referred to as an anchor.
$       Matches the end of the string. Referred to as an anchor.
*       Matches 0 or more of the previous item.
?       Matches 0 or 1 of the previous item.
+       Matches 1 or more of the previous item.
{ }     Matches {this many} of the previous item
|       The OR operator. 
        Matches either the expression before or the expression after the |
\       The escape character. 
        Allows you to use one of these metacharacters for your match.
( )     Groups characters into substrings.

That’s a lot to take in, so let’s try a simple example – a regex expression to allow only a whole number ( string of digits). Firstly, each character must be a digit 0 through 9:

[0-9]

And we’ll allow any number of such digits in our string:

[0-9]*

There must be nothing else before these characters, so we use the beginning anchor

^[0-9]*

And there mjust be nothing else after these numbers, so we add the end anchor:

^[0-9]*$

Hey presto.

Here’s some other popular examples, as promised:

^[0-9]*$
whole numbers (digits)

^[a-zA-Z0-9\d\s\-\,\.\+]+
regular text with spaces, comma and full stop (period)

^\d{5,6}(?:[-\s]\d{4})?$
US zip code

\.(jpg|gif|png)$
Only file names ending .jpg, .gif or .png

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$
Email address

[\w]{1,140}
character limit 140 (such as SMS or tweet)

(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.]\d\d\d\d$
UK date, dd/mm/yyyy

(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.]\d\d\d\d$
US date, mm/dd/yyyy

If you can improve on these, find any bugs, or have any more to add, please comment here or in Google+!