Home Backend Development Golang How to Handle Non-ASCII Characters in Go\'s Regular Expression Boundaries?

How to Handle Non-ASCII Characters in Go\'s Regular Expression Boundaries?

Oct 30, 2024 am 02:24 AM

 How to Handle Non-ASCII Characters in Go's Regular Expression Boundaries?

Golang Regular Expression Boundary and Non-ASCII Characters

Go's regular expression boundary (b) is designed to match the boundary between ASCII characters and non-ASCII characters. However, in certain scenarios, it may not behave as expected when Latin characters are involved.

The Problem

In Go, the b boundary only works when it surrounds ASCII characters. For instance, the regex b(vis)b is intended to match the word "vis". However, when the word "vis" contains Latin characters, such as "révisé", b fails to recognize it as a word boundary.

Consider the following Go code:

<code class="go">package main

import (
    "fmt"
    "regexp"
)

func main() {
    r, _ := regexp.Compile(`\b(vis)\b`)
    fmt.Println(r.MatchString("re vis e")) // Expected true
    fmt.Println(r.MatchString("revise"))  // Expected true
    fmt.Println(r.MatchString("révisé")) // Expected false
}</code>
Copy after login

Running this code produces:

true
true
true
Copy after login

Notice that the last line incorrectly matches "révisé".

The Solution

To handle cases with non-ASCII characters, you can define your own custom boundary pattern. One approach is to replace b with the following regex:

(?:\A|\s)(vis)(?:\s|\z)
Copy after login

This pattern means:

  • (?:A|s): Matches the start of the string or a whitespace character.
  • (vis): Captures the word "vis".
  • (?:s|z): Matches a whitespace character or the end of the string.

This custom boundary effectively achieves what b does for ASCII characters, but it also extends to non-ASCII characters like Latin characters.

By incorporating this custom pattern into the regex, you can obtain the desired result:

<code class="go">package main

import (
    "fmt"
    "regexp"
)

func main() {
    r, _ := regexp.Compile(`(?:\A|\s)(vis)(?:\s|\z)`)
    fmt.Println(r.MatchString("vis")) // Added this case
    fmt.Println(r.MatchString("re vis e"))
    fmt.Println(r.MatchString("revise"))
    fmt.Println(r.MatchString("révisé"))
}</code>
Copy after login

Running this code now gives:

true
true
false
false
Copy after login

As you can see, "révisé" is correctly excluded as a match.

The above is the detailed content of How to Handle Non-ASCII Characters in Go\'s Regular Expression Boundaries?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Article Tags

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Go language pack import: What is the difference between underscore and without underscore? Go language pack import: What is the difference between underscore and without underscore? Mar 03, 2025 pm 05:17 PM

Go language pack import: What is the difference between underscore and without underscore?

How to implement short-term information transfer between pages in the Beego framework? How to implement short-term information transfer between pages in the Beego framework? Mar 03, 2025 pm 05:22 PM

How to implement short-term information transfer between pages in the Beego framework?

How to convert MySQL query result List into a custom structure slice in Go language? How to convert MySQL query result List into a custom structure slice in Go language? Mar 03, 2025 pm 05:18 PM

How to convert MySQL query result List into a custom structure slice in Go language?

How do I write mock objects and stubs for testing in Go? How do I write mock objects and stubs for testing in Go? Mar 10, 2025 pm 05:38 PM

How do I write mock objects and stubs for testing in Go?

How can I define custom type constraints for generics in Go? How can I define custom type constraints for generics in Go? Mar 10, 2025 pm 03:20 PM

How can I define custom type constraints for generics in Go?

How do you write unit tests in Go? How do you write unit tests in Go? Mar 21, 2025 pm 06:34 PM

How do you write unit tests in Go?

How to write files in Go language conveniently? How to write files in Go language conveniently? Mar 03, 2025 pm 05:15 PM

How to write files in Go language conveniently?

How can I use tracing tools to understand the execution flow of my Go applications? How can I use tracing tools to understand the execution flow of my Go applications? Mar 10, 2025 pm 05:36 PM

How can I use tracing tools to understand the execution flow of my Go applications?

See all articles