The flag /g of JavaScript’s regular expressions
This blog post describes when and how to use regular expressions whose flag /g is set and what can go wrong.
(If you want to read a more general introduction to regular expressions, consult [1].)
Sometimes, a regular expression should match the same string multiple times.
Then the regular expression object needs to be created with the flag /g set (be it via a regular expression literal, be it via the constructor RegExp). That leads to the property global of the regular expression object being true and to several operations behaving differently.
The property lastIndex is used to keep track where in the string matching should continue, as we shall see in a moment.
Regular expressions have the method
Without the flag /g, the method test() of regular expressions simply checks whether there is a match somewhere in str:
With the flag /g set, test() returns true as many times as there are matches in the string. lastIndex contains the index after the last match.
Strings have the method
This method ignores the properties global and lastIndex of regex. It returns the index where regex matches (the first time).
Regular expressions have the method
If the flag /g is not set then this method always returns the match object [1] for the first match:
If the flag /g is set, then all matches are returned – the first one on the first invocation, the second one on the second invocation, etc.
Strings have the method
If the flag /g of regex is not set then this method behaves like RegExp.prototype.exec(). If the flag /g is set then this method returns all matching substrings of the string (every group 0). If there is no match then null is returned.
Strings have the method
If search is either a string or a regular expression whose flag /g is not set, then only the first match is replaced.
If the flag /g is set, then all matches are replaced.
Regular expressions whose /g flag is set are problematic if a method working with them must be invoked multiple times to return all results. That’s the case for two methods:
Then JavaScript abuses the regular expression as an iterator, as a pointer into the sequence of results. That causes problems:
The following example illustrates the latter problem.
The following is a naive implementation of a function that counts how many matches there are for the regular expression regex in the string str.
An example of using this function:
The first problem is that this function goes into an infinite loop if the regular expression’s /g flag is not set, e.g.:
The second problem is that the function doesn’t work correctly if regex.lastIndex isn’t 0. For example:
The following implementation fixes the two problems:
A simpler alternative is to use match():
One possible pitfall: str.match() returns null if the /g flag is set and there are no matches (solved above by accessing length of [] if the result of match() isn’t truthy).
Juan Ignacio Dopazo compared the performance of the two implementations of counting occurrences and found out that using test() is faster, presumably because it doesn’t collect the results in an array.
Mathias Bynens and Juan Ignacio Dopazo pointed me to match() and test(), Šime Vidas warned me about being careful with match() if there are no matches.
(If you want to read a more general introduction to regular expressions, consult [1].)
The flag /g of regular expressions
Sometimes, a regular expression should match the same string multiple times.
Then the regular expression object needs to be created with the flag /g set (be it via a regular expression literal, be it via the constructor RegExp). That leads to the property global of the regular expression object being true and to several operations behaving differently.
> var regex = /x/g;
> regex.global
true
The property lastIndex is used to keep track where in the string matching should continue, as we shall see in a moment.
RegExp.prototype.test(): determining whether there is a match
Regular expressions have the method
RegExp.prototype.test(str)
Without the flag /g, the method test() of regular expressions simply checks whether there is a match somewhere in str:
> var str = '_x_x';
> /x/.test(str)
true
With the flag /g set, test() returns true as many times as there are matches in the string. lastIndex contains the index after the last match.
> var regex = /x/g;
> regex.lastIndex
0
> regex.test(str)
true
> regex.lastIndex
2
> regex.test(str)
true
> regex.lastIndex
4
> regex.test(str)
false
String.prototype.search(): finding the index of a match
Strings have the method
String.prototype.search(regex)
This method ignores the properties global and lastIndex of regex. It returns the index where regex matches (the first time).
> '_x_x'.search(/x/)
1
RegExp.prototype.exec(): capturing groups, optionally repeatedly
Regular expressions have the method
RegExp.prototype.exec(str)
If the flag /g is not set then this method always returns the match object [1] for the first match:
> var str = '_x_x';
> var regex1 = /x/;
> regex1.exec(str)
[ 'x', index: 1, input: '_x_x' ]
> regex1.exec(str)
[ 'x', index: 1, input: '_x_x' ]
If the flag /g is set, then all matches are returned – the first one on the first invocation, the second one on the second invocation, etc.
> var regex2 = /x/g;
> regex2.exec(str)
[ 'x', index: 1, input: '_x_x' ]
> regex2.exec(str)
[ 'x', index: 3, input: '_x_x' ]
> regex2.exec(str)
null
String.prototype.match():
Strings have the method
String.prototype.match(regex)
If the flag /g of regex is not set then this method behaves like RegExp.prototype.exec(). If the flag /g is set then this method returns all matching substrings of the string (every group 0). If there is no match then null is returned.
> var regex = /x/g;
> '_x_x'.match(regex)
[ 'x', 'x' ]
> 'abc'.match(regex)
null
replace(): search and replace
Strings have the method
String.prototype.replace(search, replacement)
If search is either a string or a regular expression whose flag /g is not set, then only the first match is replaced.
If the flag /g is set, then all matches are replaced.
> '_x_x'.replace(/x/, 'y')
'_y_x'
> '_x_x'.replace(/x/g, 'y')
'_y_y'
The problem with the /g flag
Regular expressions whose /g flag is set are problematic if a method working with them must be invoked multiple times to return all results. That’s the case for two methods:
- RegExp.prototype.test()
- RegExp.prototype.exec()
Then JavaScript abuses the regular expression as an iterator, as a pointer into the sequence of results. That causes problems:
- You can’t inline the regular expression when you call those methods. For example:
// Don’t do that:
var count = 0;
while (/a/g.test('babaa')) count++;
The above loop is infinite, because a new regular expression is created for each loop iteration, which restarts the iteration over the results. Therefore, the above code must be rewritten:
var count = 0;
var regex = /a/g;
while (regex.test('babaa')) count++;
Note: it’s a best practice not to inline, anyway, but you have to be aware that you can’t do it, not even in quick hacks.
- Code that wants to invoke test() and exec() multiple times must be careful with regular expressions handed to it as a parameter. Their flag /g must be set and it must reset their lastIndex.
The following example illustrates the latter problem.
Example: counting occurrences
The following is a naive implementation of a function that counts how many matches there are for the regular expression regex in the string str.
// Naive implementation
function countOccurrences(regex, str) {
var count = 0;
while (regex.test(str)) count++;
return count;
}
An example of using this function:
> countOccurrences(/x/g, '_x_x')
2
The first problem is that this function goes into an infinite loop if the regular expression’s /g flag is not set, e.g.:
countOccurrences(/x/, '_x_x')
The second problem is that the function doesn’t work correctly if regex.lastIndex isn’t 0. For example:
> var regex = /x/g;
> regex.lastIndex = 2;
2
> countOccurrences(regex, '_x_x')
1
The following implementation fixes the two problems:
function countOccurrences(regex, str) {
if (! regex.global) {
throw new Error('Please set flag /g of regex');
}
var origLastIndex = regex.lastIndex; // store
regex.lastIndex = 0;
var count = 0;
while (regex.test(str)) count++;
regex.lastIndex = origLastIndex; // restore
return count;
}
Using match() to count occurrences
A simpler alternative is to use match():
function countOccurrences(regex, str) {
if (! regex.global) {
throw new Error('Please set flag /g of regex');
}
return (str.match(regex) || []).length;
}
One possible pitfall: str.match() returns null if the /g flag is set and there are no matches (solved above by accessing length of [] if the result of match() isn’t truthy).
Performance considerations
Juan Ignacio Dopazo compared the performance of the two implementations of counting occurrences and found out that using test() is faster, presumably because it doesn’t collect the results in an array.
Acknowledgements
Mathias Bynens and Juan Ignacio Dopazo pointed me to match() and test(), Šime Vidas warned me about being careful with match() if there are no matches.
Comments
Post a Comment