Word Boundary Semantics in PHP Regular Expressions
In PHP, word boundaries are implemented using the b metacharacter, which matches transitions between word characters (w) and non-word characters (W). However, its behavior can be nuanced, as exemplified by your provided test cases.
Unexpected Word Boundaries
In your test cases, you expected the following results:
preg_match("/(^|\b)@nimal/i", "something@nimal", $match); // false preg_match("/(^|\b)@nimal/i", "something!@nimal", $match); // true
But the actual results were reversed:
preg_match("/(^|\b)@nimal/i", "something@nimal", $match); // true preg_match("/(^|\b)@nimal/i", "something!@nimal", $match); // false
This anomaly occurs because the b metacharacter matches at the transition from a word character (w) to a non-word character (W). In the first case, "something@nimal," the word boundary occurs between "g" (a word character) and "@" (a non-word character), leading to a match. However, in the second case, "something!@nimal," there is no word boundary between "!" and "@" because both are non-word characters.
Matching Word Beginnings
To match words that start with a specific sequence, you need to ensure that there is a word boundary (b) before your target sequence. For example, to match words starting with "@nimal," you would use the following regex:
preg_match("/(\b)@nimal/i", "something@nimal", $match); // true preg_match("/(\b)@nimal/i", "something!@nimal", $match); // false
In this regex, the word boundary (b) will ensure that there is a transition from a word character to a non-word character before "@nimal." This will match words starting with "@nimal," but not words that have "@nimal" in the middle, such as "ducat."
The above is the detailed content of When Do Word Boundaries Occur in PHP Regular Expressions?. For more information, please follow other related articles on the PHP Chinese website!