Regular Expressions (VBA) ·

“It pays to be obvious, especially if you have a reputation for subtlety.” [Isaac Asimov]

Abstract

A regular expression is a string that is used to describe or match a set of strings, according to certain syntax rules. Regular expressions can easily be used to describe complex string filters, to extract or to replace strings or parts of strings.

A tutorial: (external link!) http://regenechsen.de/wp/regulaere-ausdruecke/02-regular-expressions-in-tb-engl/

A tester: (external link!) http://www.regex-tester.de/regex_en.html

A huge bunch of interesting examples: (external link!) http://regexlib.com

Appendix – RegExpReplace Code

A useful function which I found in the web:

Please read my Disclaimer.

'String replacement with Regular Expressions via vbscript.regexp
'Parameters:
'SourceString   String to look into
'Pattern        Search pattern
'ReplaceString  Replacement string, use $i for submatches, i=1,2,...
'IgnoreCase     Flag whether to ignore capitals
'GlobalReplace  Flag whether to replace all matches or only first one
'MultiLine      Flag whether ^ and $ match in each row
'
'Returns:       SourceString with replacement(s) if applicable
'
Function RegExpReplace(ByVal SourceString As String, _
    ByVal Pattern As String, ByVal ReplaceString As String, _
    Optional ByVal IgnoreCase As Boolean = False, _
    Optional ByVal GlobalReplace As Boolean = False, _
    Optional ByVal MultiLine As Boolean = False) As String
   
    Dim objRE As Object
   
    Set objRE = CreateObject("vbscript.regexp")
    objRE.Pattern = Pattern
    objRE.IgnoreCase = IgnoreCase
    objRE.Global = GlobalReplace
    objRE.MultiLine = MultiLine
    RegExpReplace = objRE.Replace(SourceString, ReplaceString)
    Set objRE = Nothing
End Function

Appendix - Syntax Rules

Please compare with this article from Microsoft: (external link!) https://docs.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/scripting-articles/ms974570(v=msdn.10)?redirectedfrom=MSDN

Expression	Explanation
\	Marks the next character as either a special character or a literal. For example, “n“matches the character “n”. “\n” matches a newline character. The sequence “\” matches “\” and “(” matches “(”.
^	Matches the beginning of input.
$	Matches the end of input.
*	Matches the preceding character zero or more times. For example, “zo*” matches “z”, “zo”, “zoo”, “zooo”, and so on.
+	Matches the preceding character one or more times. For example, “zo+” matches “zoo” but not “z”.
?	Matches the preceding character zero or one time. For example, “a?ve?” matches the “ve” in “never”.
.	Matches any single character except a newline character.
(pattern)	Matches pattern and remembers the match. The matched substring can be retrieved from the resulting Matches collection, using $1..$n. To match parentheses characters ( ), use “(” or “)”.
x\|y	Matches either x or y. For example, “z\|wood” matches “z” or “wood”. “(z\|w)oo” matches “zoo” or “wood”.
{n}	n is a nonnegative integer. Matches exactly n times. For example, “o{2}” does not match the “o” in “Bob,” but matches the first two o’s in “foooood”.
{n,}	n is a nonnegative integer. Matches at least n times. For example, “o{2,}” does not match the “o” in “Bob” and matches all the o’s in “foooood.” “o{1,}” is equivalent to “o+”. “o{0,}” is equivalent to “o*”.
{n,m}	m and n are nonnegative integers. Matches at least n and at most m times. For example, “o{1,3}” matches the first three o’s in “fooooood.” “o{0,1}” is equivalent to “o?”.
[xyz]	A character set. Matches any one of the enclosed characters. For example, “[abc]” matches the “a” in “plain”.
[^xyz]	A negative character set. Matches any character not enclosed. For example, “[^abc]” matches the “p” in “plain”.
[a-z]	A range of characters. Matches any character in the specified range. For example, “[a-z]” matches any lowercase alphabetic character in the range “a” through “z”.
[^m-z]	A negative range characters. Matches any character not in the specified range. For example, “[m-z]” matches any character not in the range “m” through “z”.
\b	Matches a word boundary, that is, the position between a word and a space. For example, “er\b” matches the “er” in “never” but not the “er” in “verb”.
\B	Matches a non-word boundary. “ea*r\B” matches the “ear” in “never early”.
\d	Matches a digit character. Equivalent to [0-9].
\D	Matches a non-digit character. Equivalent to [^0-9].
\f	Matches a form-feed character.
\n	Matches a newline character.
\r	Matches a carriage return character.
\s	Matches any white space including space, tab, form-feed, etc. Equivalent to “[ \f\n\r\t\v]”.
\S	Matches any nonwhite space character. Equivalent to “[^ \f\n\r\t\v]”.
\t	Matches a tab character.
\v	Matches a vertical tab character.
\w	Matches any word character including underscore. Equivalent to “[A-Za-z0-9_]”.
\W	Matches any non-word character. Equivalent to “[^A-Za-z0-9_]”.
\num	Matches num, where num is a positive integer. A reference back to remembered matches. For example, “(.)\1” matches two consecutive identical characters.
\n	Matches n, where n is an octal escape value. Octal escape values must be 1, 2, or 3 digits long. For example, “\11” and “\011” both match a tab character. “\0011” is the equivalent of “\001” & “1”. Octal escape values must not exceed 256. If they do, only the first two digits comprise the expression. Allows ASCII codes to be used in regular expressions.
\xn	Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, “\x41” matches “A”. “\x041” is equivalent to “\x04” & “1”. Allows ASCII codes to be used in regular expressions.