Rune vs. Byte Ranging over String Explained
When traversing a string using a for loop in Go, you may have noticed a discrepancy in the data types obtained depending on the approach you use. Ranging over a string using the form "for i, c := range s" yields a rune type, while using direct indexing like "s[i]" returns a byte type.
This difference arises from the underlying definition of a string in Go, which specifies it as a sequence of bytes. When you access a string using "s[i]", you directly retrieve the byte value at index i. However, when you iterate over a string using "for range", the range clause operates on Unicode code points, which are represented as runes in Go.
The reason why range works over runes is to provide an idiomatic way to iterate over a string without having to manually decode UTF-8 sequences. If range only worked over bytes, you would have to write your own custom loops to traverse the runes.
However, you can still access bytes within a string using techniques like:
for i := 0; i < len(s); i++ { ... }
or
for i, b := range []byte(s) { ... }
These alternative methods allow you the flexibility to work with bytes explicitly when necessary.
In summary, the use of runes in for-range constructs over strings provides a convenient way to iterate over the Unicode representation of a string. While you still have access to individual bytes if needed, this approach streamlines the process, making it easier and more intuitive to manipulate strings in Go.
The above is the detailed content of Rune vs. Byte in Go String Iteration: When Do I Get a Rune and When a Byte?. For more information, please follow other related articles on the PHP Chinese website!