The flag /g of JavaScript’s regular expressions

This blog post describes when and how to use regular expressions whose flag /g is set and what can go wrong.

(If you want to read a more general introduction to regular expressions, consult [1].)



The flag /g of regular expressions



Sometimes, a regular expression should match the same string multiple times.
Then the regular expression object needs to be created with the flag /g set (be it via a regular expression literal, be it via the constructor RegExp). That leads to the property global of the regular expression object being true and to several operations behaving differently.

> var regex = /x/g;
> regex.global
true

The property lastIndex is used to keep track where in the string matching should continue, as we shall see in a moment.

RegExp.prototype.test(): determining whether there is a match



Regular expressions have the method

RegExp.prototype.test(str)


Without the flag /g, the method test() of regular expressions simply checks whether there is a match somewhere in str:

> var str = '_x_x';

> /x/.test(str)
true


With the flag /g set, test() returns true as many times as there are matches in the string. lastIndex contains the index after the last match.

> var regex = /x/g;
> regex.lastIndex
0
> regex.test(str)
true
> regex.lastIndex
2
> regex.test(str)
true
> regex.lastIndex
4
> regex.test(str)
false


String.prototype.search(): finding the index of a match



Strings have the method

String.prototype.search(regex)

This method ignores the properties global and lastIndex of regex. It returns the index where regex matches (the first time).

> '_x_x'.search(/x/)
1


RegExp.prototype.exec(): capturing groups, optionally repeatedly



Regular expressions have the method

RegExp.prototype.exec(str)

If the flag /g is not set then this method always returns the match object [1] for the first match:

> var str = '_x_x';
> var regex1 = /x/;

> regex1.exec(str)
[ 'x', index: 1, input: '_x_x' ]
> regex1.exec(str)
[ 'x', index: 1, input: '_x_x' ]

If the flag /g is set, then all matches are returned – the first one on the first invocation, the second one on the second invocation, etc.

> var regex2 = /x/g;

> regex2.exec(str)
[ 'x', index: 1, input: '_x_x' ]
> regex2.exec(str)
[ 'x', index: 3, input: '_x_x' ]
> regex2.exec(str)
null


String.prototype.match():



Strings have the method

String.prototype.match(regex)


If the flag /g of regex is not set then this method behaves like RegExp.prototype.exec(). If the flag /g is set then this method returns all matching substrings of the string (every group 0). If there is no match then null is returned.

> var regex = /x/g;

> '_x_x'.match(regex)
[ 'x', 'x' ]
> 'abc'.match(regex)
null


replace(): search and replace



Strings have the method

String.prototype.replace(search, replacement)

If search is either a string or a regular expression whose flag /g is not set, then only the first match is replaced.
If the flag /g is set, then all matches are replaced.

> '_x_x'.replace(/x/, 'y')
'_y_x'
> '_x_x'.replace(/x/g, 'y')
'_y_y'


The problem with the /g flag



Regular expressions whose /g flag is set are problematic if a method working with them must be invoked multiple times to return all results. That’s the case for two methods:

  • RegExp.prototype.test()

  • RegExp.prototype.exec()


Then JavaScript abuses the regular expression as an iterator, as a pointer into the sequence of results. That causes problems:

  • You can’t inline the regular expression when you call those methods. For example:

    // Don’t do that:
    var count = 0;
    while (/a/g.test('babaa')) count++;

    The above loop is infinite, because a new regular expression is created for each loop iteration, which restarts the iteration over the results. Therefore, the above code must be rewritten:

    var count = 0;
    var regex = /a/g;
    while (regex.test('babaa')) count++;

    Note: it’s a best practice not to inline, anyway, but you have to be aware that you can’t do it, not even in quick hacks.

  • Code that wants to invoke test() and exec() multiple times must be careful with regular expressions handed to it as a parameter. Their flag /g must be set and it must reset their lastIndex.



The following example illustrates the latter problem.

Example: counting occurrences



The following is a naive implementation of a function that counts how many matches there are for the regular expression regex in the string str.

// Naive implementation
function countOccurrences(regex, str) {
var count = 0;
while (regex.test(str)) count++;
return count;
}

An example of using this function:

> countOccurrences(/x/g, '_x_x')
2

The first problem is that this function goes into an infinite loop if the regular expression’s /g flag is not set, e.g.:

countOccurrences(/x/, '_x_x')

The second problem is that the function doesn’t work correctly if regex.lastIndex isn’t 0. For example:

> var regex = /x/g;
> regex.lastIndex = 2;
2
> countOccurrences(regex, '_x_x')
1

The following implementation fixes the two problems:

function countOccurrences(regex, str) {
if (! regex.global) {
throw new Error('Please set flag /g of regex');
}
var origLastIndex = regex.lastIndex; // store
regex.lastIndex = 0;

var count = 0;
while (regex.test(str)) count++;

regex.lastIndex = origLastIndex; // restore
return count;
}


Using match() to count occurrences



A simpler alternative is to use match():

function countOccurrences(regex, str) {
if (! regex.global) {
throw new Error('Please set flag /g of regex');
}
return (str.match(regex) || []).length;
}

One possible pitfall: str.match() returns null if the /g flag is set and there are no matches (solved above by accessing length of [] if the result of match() isn’t truthy).

Performance considerations



Juan Ignacio Dopazo compared the performance of the two implementations of counting occurrences and found out that using test() is faster, presumably because it doesn’t collect the results in an array.

Acknowledgements



Mathias Bynens and Juan Ignacio Dopazo pointed me to match() and test(), Šime Vidas warned me about being careful with match() if there are no matches.

Reference




  1. JavaScript: an overview of the regular expression API


Comments

Popular posts from this blog

Steve Lopez and the Importance of Newspapers

Ideas for fixing unconnected computing

Omar to kill me