PHP can easily intercept mixed Chinese and English strings with just 2 lines of code!-PHP Tutorial-php.cn

PHP can easily intercept mixed Chinese and English strings with just 2 lines of code!

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Release： 2016-07-25 09:12:08

Original

1250 people have browsed it

When it comes to Chinese and English mixed counting and interception, the first thing that comes to mind is ascii, hexadecimal, regular matching, and loop counting.

Today I will share with you the mb extension of php to teach you how to process strings easily.

First introduce to you the functions used:

mb_strwidth($str, $encoding) returns the width of the string

$str The string to be calculated

$encoding The encoding to be used, such as utf8, gbk

mb_strimwidth($str, $start, $width, $tail, $encoding) intercepts the string by width

$str The string to be intercepted

$start The position from which to start interception, the default is 0

$width The width to be intercepted

$tail is appended to the string after the intercepted string, commonly used is...

$encoding The encoding to use

I will give you an example below:

/**
* utf8 encoding format
* 1 Chinese character occupies 3 bytes
* What we hope is that 1 Chinese character occupies 2 bytes,
* Because from the width point of view, the position occupied by 2 English letters is equivalent to 1 Chinese character
*/
// Test string
$str = 'aaaaahahaaaaahahahaaa';
echo strlen($str); // only Use strlen to output 25 bytes
// You must specify the encoding, otherwise PHP's internal code mb_internal_encoding() will be used to view the internal code
// Use mb_strwidth to output a string with a width of 20 and use utf8 encoding
echo mb_strwidth($ str, 'utf8');
// Only intercept if the width is greater than 10
if(mb_strwidth($str, 'utf8')>10){
// Set here to intercept from 0, take 10 appends ..., use utf8 encoding
// Note that the appended... will also be calculated into the length
$str = mb_strimwidth($str, 0, 10, '...', 'utf8');
}
// Finally output aaaa... 4 a's are counted as 4 1's, 2 are counted as 3 points, and 3 are counted as 4+2+3=9
// Isn't it very simple? Some people have said Why is it 9 and not 10?
// Because "Ah" happens to be followed by "Ah", Chinese counts 2, 9+2=11 exceeds the setting, so removing 1 is 9
echo $str;

Copy code

Let me introduce some other functions to you below:

mb_strlen($str, $encoding) returns the length of the string

$str The string to be calculated

$encoding The encoding used

mb_substr($str, $start, $length, $encoding) intercepts the string

$str The string to be intercepted

$start Where to start intercepting

$length How long to intercept

$encoding The encoding used

In fact, these two functions are very similar to strlen() and substr(). The only difference is that the encoding can be set.

Example below:

/**
* utf8 encoding format
* 1 Chinese occupies 3 bytes
*/
$str = 'aa12ahaa';
echo strlen($str); // Direct output length is 9
// Output length is 7 , why is it 7?
// Note that after setting the encoding here, whether it is Chinese or English, the length of each is 1
// a a 1 2 ah a a
// 1+1+1+1+1+1+1 = 7
// Is it exactly 7 characters?
echo mb_strlen($str, 'utf8');
// The same is true for mb_substr
// I only want 5 characters now
echo mb_substr($str, 0, 5, 'utf8'); // Output aa12