When it comes to Chinese and English mixed counting and interception, the first thing that comes to mind is ascii, hexadecimal, regular matching, and loop counting.
Today I will share with you the mb extension of php to teach you how to process strings easily.
First introduce to you the functions used:
mb_strwidth($str, $encoding) returns the width of the string
$str The string to be calculated
$encoding The encoding to be used, such as utf8, gbk
mb_strimwidth($str, $start, $width, $tail, $encoding) intercepts the string by width
$str The string to be intercepted
$start The position from which to start interception, the default is 0
$width The width to be intercepted
$tail is appended to the string after the intercepted string, commonly used is...
$encoding The encoding to use
I will give you an example below:
- /**
- * utf8 encoding format
- * 1 Chinese character occupies 3 bytes
- * What we hope is that 1 Chinese character occupies 2 bytes,
- * Because from the width point of view, the position occupied by 2 English letters is equivalent to 1 Chinese character
- */
- // Test string
- $str = 'aaaaahahaaaaahahahaaa';
- echo strlen($str); // only Use strlen to output 25 bytes
- // You must specify the encoding, otherwise PHP's internal code mb_internal_encoding() will be used to view the internal code
- // Use mb_strwidth to output a string with a width of 20 and use utf8 encoding
- echo mb_strwidth($ str, 'utf8');
- // Only intercept if the width is greater than 10
- if(mb_strwidth($str, 'utf8')>10){
- // Set here to intercept from 0, take 10 appends ..., use utf8 encoding
- // Note that the appended... will also be calculated into the length
- $str = mb_strimwidth($str, 0, 10, '...', 'utf8');
- }
-
- // Finally output aaaa... 4 a's are counted as 4 1's, 2 are counted as 3 points, and 3 are counted as 4+2+3=9
- // Isn't it very simple? Some people have said Why is it 9 and not 10?
- // Because "Ah" happens to be followed by "Ah", Chinese counts 2, 9+2=11 exceeds the setting, so removing 1 is 9
- echo $str;
Copy code
Let me introduce some other functions to you below:
mb_strlen($str, $encoding) returns the length of the string
$str The string to be calculated
$encoding The encoding used
mb_substr($str, $start, $length, $encoding) intercepts the string
$str The string to be intercepted
$start Where to start intercepting
$length How long to intercept
$encoding The encoding used
In fact, these two functions are very similar to strlen() and substr(). The only difference is that the encoding can be set.
Example below:
-
- /**
- * utf8 encoding format
- * 1 Chinese occupies 3 bytes
- */
- $str = 'aa12ahaa';
- echo strlen($str); // Direct output length is 9
-
- // Output length is 7 , why is it 7?
- // Note that after setting the encoding here, whether it is Chinese or English, the length of each is 1
- // a a 1 2 ah a a
- // 1+1+1+1+1+1+1 = 7
- // Is it exactly 7 characters?
- echo mb_strlen($str, 'utf8');
-
- // The same is true for mb_substr
- // I only want 5 characters now
- echo mb_substr($str, 0, 5, 'utf8'); // Output aa12
Copy code
In fact, there are many useful functions in the mb extension, so I won’t list them all here.
Interested friends can check the official manual
http://www.php.net/manual/zh/ref.mbstring.php
Okay, that’s all for today.
|