d
vs. [0-9]
– A Surprising ComparisonA recent discussion sparked debate about the relative efficiency of d
and [0-9]
in regular expressions. Initial testing suggested d
was faster, but further investigation revealed a more nuanced reality: d
can be less efficient in specific scenarios. This article explores the reasons behind this discrepancy.
The key difference lies in the character sets each expression matches. [0-9]
strictly matches only the ASCII digits 0 through 9. d
, however, is broader; it encompasses all Unicode digits, including those from various non-Latin scripts (e.g., Persian, Devanagari).
This expanded matching range for d
can impact performance. The regex engine must evaluate a larger character set, potentially increasing processing time. While the difference might be negligible in many cases, the impact becomes more pronounced when dealing with large datasets or complex regex patterns.
The following code snippet illustrates the extensive character set matched by d
:
var sb = new StringBuilder(); for (UInt16 i = 0; i < 0x10FFFF; i++) { if (char.IsDigit((char)i)) { sb.Append((char)i); } } Console.WriteLine(sb.ToString());
This code iterates through all Unicode code points and appends only those classified as digits by char.IsDigit()
, effectively mirroring the behavior of d
. The resulting output is a comprehensive list of Unicode digits, highlighting the significantly larger character set compared to the ten digits matched by [0-9]
.
Therefore, while d
offers broader compatibility, [0-9]
provides potentially superior performance when dealing exclusively with ASCII digits. The choice between them should be guided by the specific needs of your application and the nature of the data being processed. If you are certain your input only contains ASCII digits, [0-9]
is likely the more efficient option.
The above is the detailed content of Is `\d` Less Efficient Than `[0-9]` in Regex?. For more information, please follow other related articles on the PHP Chinese website!