Chinese and English mixed counting and interception, no custom functions are needed, but the mb extension of PHP is used, and the original PHP function is used to easily handle string interception.
First, let’s introduce common functions to intercept strings.
mb_strwidth($str, $encoding) returns the width of the string
$str The string to be calculated
$encoding The encoding to use, such as utf8, gbk
mb_strimwidth($str, $start, $width, $tail, $encoding) intercepts string according to width
$str The string to be intercepted
$start From which position to intercept, the default is 0
$width The width to be intercepted
$tail is appended to the string after the intercepted string. Commonly used ones are...
$encoding The encoding to use
Example:
-
- /**
- * utf8 encoding format
- * 1 Chinese character occupies 3 bytes
- * What we hope is that 1 Chinese character occupies 2 bytes,
- * Because from the width point of view, the position occupied by 2 English letters is equivalent to 1 Chinese character
- */
- // Test string
- $str = 'aaaaahahaaaaahahahaaa';
- echo strlen($str); // Only strlen is used to output 25 bytes
- // The encoding must be specified, otherwise PHP's internal code mb_internal_encoding() will be used to view the internal code
- // Use mb_strwidth to output a string with a width of 20 and use utf8 encoding
- echo mb_strwidth($ str, 'utf8');
- // Only intercept if the width is greater than 10
- if(mb_strwidth($str, 'utf8')>10){
- // Set to intercept from 0 here, take 10 appends. .., use utf8 encoding
- // Note that the appended... will also be calculated into the length
- $str = mb_strimwidth($str, 0, 10, '...', 'utf8');
- }
- //The final output is aaaa... 4 a's are counted as 4 1's, 2 are counted as 3 points, and 3 are counted as 4+2+3=9
- // Isn't it very simple? Some people have said why. Isn’t 9 10?
- // Because "Ah" happens to be followed by "Ah", Chinese counts 2, 9+2=11 exceeds the setting, so removing 1 is 9
- echo $str;
Copy code
Other string interception functions:
mb_strlen($str, $encoding) returns the length of the string
$str The string to be calculated
$encoding The encoding to use
mb_substr($str, $start, $length, $encoding) intercepts string
$str The string to be intercepted
$start Where to start intercepting
$length intercepts the length
$encoding encoding to use
In fact, these two functions are very similar to strlen() and substr(). The only difference is that the encoding can be set.
The above two examples of string interception functions.
-
-
- /**
- * utf8 encoding format
- * 1 Chinese occupies 3 bytes
- */
- $str = 'aa12ahaa';
- echo strlen($str); // Direct output length is 9
- // Output length is 7. Why 7?
- // Note that after setting the encoding here, whether it is Chinese or English, the length of each is 1
- // a a 1 2 ah a a
- // 1+1+1+1+1+1+1 = 7
- // Is it exactly 7 characters?
- echo mb_strlen($str, 'utf8');
- // The same is true for mb_substr
- // I only want 5 characters now
- echo mb_substr($str, 0, 5, ' utf8'); // Output aa12
Copy code
There are many practical functions in the mb extension library, which are not introduced one by one here. If you are interested, you can refer to the relevant content in the PHP manual.
|