Why Does Go\'s Regex \\b Boundary Fail with Non-ASCII Characters?

DDD
Release: 2024-10-29 00:26:02
Original
749 people have browsed it

Why Does Go's Regex \b Boundary Fail with Non-ASCII Characters?

Golang Regex Boundary Issue with Non-ASCII Characters

In Go, the b boundary option is expected to match at the boundary of ASCII characters, excluding accented characters such as é. This behavior can lead to unexpected results when working with strings containing non-ASCII characters. For instance, consider the following code:

<code class="go">package main

import (
    "fmt"
    "regexp"
)

func main() {
    r, _ := regexp.Compile(`\b(vis)\b`)
    fmt.Println(r.MatchString("re vis e")) // True
    fmt.Println(r.MatchString("revise")) // False
    fmt.Println(r.MatchString("révisé")) // True
}</code>
Copy after login

In this example, the b(vis)b regex matches the substring "vis" at word boundaries. However, when applied to "révisé", it incorrectly returns True because é is not considered a word character. To address this issue, you can employ an alternative approach:

<code class="go">r, _ := regexp.Compile(`(?:\A|\s)(vis)(?:\s|\z)`)
fmt.Println(r.MatchString("vis")) // True
fmt.Println(r.MatchString("re vis e")) // True
fmt.Println(r.MatchString("revise")) // False
fmt.Println(r.MatchString("révisé")) // False</code>
Copy after login

This solution utilizes a non-capturing group (?:A|s)(vis)(?:s|z) to match any of the following characters:

  • Start of string (A)
  • Whitespace (s)

This mimics the behavior of b but includes non-ASCII characters as potential word boundaries. By combining these components, it successfully matches "vis" at the beginning or end of a word, regardless of the surrounding characters.

The above is the detailed content of Why Does Go\'s Regex \\b Boundary Fail with Non-ASCII Characters?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!