PHP intercepts string length (Chinese and English mixed string)

PHP intercepts string length (Chinese and English mixed string)_PHP tutorial

WBOY

Release： 2016-07-13 16:56:34

Original

1237 people have browsed it

The article introduces the string interception function from the interception function that comes with PHP to finally supporting Chinese, English and mixed Chinese and English string interception methods. Friends in need can refer to it.

Get part of the string.

Syntax: string substr(string string, int start, int [length]);

Return value: String

Function type: Data processing

Content Description

This function extracts length characters from the start position of the string string. If start is a negative number, it starts from the end of the string. If the omitted parameter length exists but is a negative number, it means that the length character from the bottom is obtained.

Usage Example

The code is as follows

Copy code

代码如下	复制代码
echo substr("abcdef", 1, 3); // 返回 "bcd" echo substr("abcdef", -2); // 返回 "ef" echo substr("abcdef", -3, 1); // 返回 "d" echo substr("abcdef", 1, -1); // 返回 "bcde" ?>

echo substr("abcdef", 1, 3); // Return "bcd"

echo substr("abcdef", -2); // Return "ef"

echo substr("abcdef", -3, 1); // return "d"
echo substr("abcdef", 1, -1); // return "bcde"

代码如下	复制代码
< ?php //截取中文字符串 function mysubstr($str, $start, $len) { $tmpstr = ""; $strlen = $start + $len; for($i = 0; $i < $strlen; $i++) { if(ord(substr($str, $i, 1)) > 0xa0) { $tmpstr .= substr($str, $i, 2); $i++; } else $tmpstr .= substr($str, $i, 1); } return $tmpstr; } ?>

The above only supports English and not Chinese

代码如下	复制代码
< ?php //截取utf8字符串 function utf8Substr($str, $from, $len) { return preg_replace('#^(?:[x00-x7F]\|[xC0-xFF][x80-xBF]+){0,'.$from.'}'. '((?:[x00-x7F]\|[xC0-xFF][x80-xBF]+){0,'.$len.'}).*#s', '',$str); } ?>

Intercept GB2312 Chinese string

The code is as follows	Copy code
< ?php <🎜> //Intercept Chinese string <🎜> function mysubstr($str, $start, $len) { <🎜> $tmpstr = ""; <🎜> $strlen = $start + $len; <🎜> for($i = 0; $i < $strlen; $i++) { <🎜> If(ord(substr($str, $i, 1)) > 0xa0) { $tmpstr .= substr($str, $i, 2); $i++; } else $tmpstr .= substr($str, $i, 1); } Return $tmpstr; } ?>

Intercept utf8 encoded multi-byte string

The code is as follows	Copy code
< ?php <🎜> //Intercept utf8 string <🎜> function utf8Substr($str, $from, $len) <🎜> { <🎜> Return preg_replace('#^(?:[x00-x7F]\|[xC0-xFF][x80-xBF]+){0,'.$from.'}'. <🎜> ‘((?:[x00-x7F]\|[xC0-xFF][x80-xBF]+){0,'.$len.'}).*#s', <🎜> ‘$1’,$str); <🎜> } <🎜> ?>

/*
* Function: The function is the same as substr, except that it will not cause garbled characters
* Parameter:
* Return:
*/

The code is as follows

Copy code

function utf8_substr( $str , $start , $length=null ){

                   // Intercept normally first.
           $res = substr( $str, $start, $length);
           $strlen = strlen( $str);

/* Then determine whether the first and last 6 bytes are complete (not incomplete) */

                            // If the parameter start is a positive number
             if ( $start >= 0 ){
                         // intercept about 6 bytes forward
                 $next_start = $start + $length; // Initial position
                 $next_len = $next_start + 6 <= $strlen ? 6 : $strlen - $next_start;
                  $next_segm = substr( $str , $next_start , $next_len );

// If the first byte is not the first byte of the complete character, then intercept about 6 bytes
$prev_start = $start - 6 > 0 ? $start - 6 : 0;
                 $prev_segm = substr( $str , $prev_start , $start - $prev_start );
}
// start is a negative number
        else{
                         // intercept about 6 bytes forward
                 $next_start = $strlen + $start + $length; // Initial position
                 $next_len = $next_start + 6 <= $strlen ? 6 : $strlen - $next_start;
                  $next_segm = substr( $str , $next_start , $next_len );
                                                                      // If the first byte is not the first byte of the complete character, then intercept about 6 bytes.
                $start = $strlen + $start;
$prev_start = $start - 6 > 0 ? $start - 6 : 0;
                 $prev_segm = substr( $str , $prev_start , $start - $prev_start );
}

// Determine whether the first 6 bytes comply with utf8 rules

If ( preg_match( '@^([x80-xBF]{0,5})[xC0-xFD]?@' , $next_segm , $bytes ) ){
If ( !empty( $bytes[1] ) ){
$bytes = $bytes[1];
$res .= $bytes;
}
}

// Determine whether the last 6 bytes comply with utf8 rules
         $ord0 = ord( $res[0] );
If ( 128 <= $ord0 && 191 >= $ord0 ){
// Take it back and add it in front of the res.
If ( preg_match( '@[xC0-xFD][x80-xBF]{0,5}$@' , $prev_segm , $bytes ) ){
If ( !empty( $bytes[0] ) ) {
$bytes = $bytes[0];
                           $res = $bytes . $res;
                }
            }
}

return $res;
}

Test data::

The code is as follows

代码如下	复制代码
$str = 'dfjdjf测13f试65&2数据ｆｄｊ（1就mfe&……就'; var_dump( utf8_substr( $str , 22 , 12 ) ); echo ' '; var_dump( utf8_substr( $str , 22 , -6 ) ); echo ' '; var_dump( utf8_substr( $str , 9 , 12 ) ); echo ' '; var_dump( utf8_substr( $str , 19 , 12 ) ); echo ' '; var_dump( utf8_substr( $str , 28 , -6 ) ); echo ' ';

Copy code

'; var_dump( utf8_substr( $str , 22 , -6 ) ); echo '
'; var_dump( utf8_substr( $str , 9 , 12 ) ); echo '
'; var_dump( utf8_substr( $str , 19 , 12 ) ); echo '
'; var_dump( utf8_substr( $str , 28 , -6 ) ); echo '
';

显示结果::(截取无乱码, 欢迎大家测试, 提交bug)
string(12) "据ｆｄｊ"
string(26) "据ｆｄｊ（1就mfe&…"
string(13) "13f试65&2数"
string(12) "数据ｆｄ"
string(20) "ｄｊ（1就mfe&…"

把我常用的分享出来

下面我们再来看中文截函数吧。

代码如下

复制代码

function MooCutstr($string, $length, $dot = ' ...') {
global $charset;

if(strlen($string) <= $length) {
return $string;
}
$string = str_replace(array('&', '"', '<', '>'), array('&', '"', '<', '>'), $string);
$strcut = '';
if(strtolower($charset) == 'utf-8') {
$n = $tn = $noc = 0;
while($n < strlen($string)) {
$t = ord($string[$n]);
if($t == 9 || $t == 10 || (32 <= $t && $t <= 126)) {
$tn = 1; $n++; $noc++;
} elseif (194 <= $t && $t <= 223) {
$tn = 2; $n += 2; $noc += 2;
} elseif (224 <= $t && $t < 239) {
$tn = 3; $n += 3; $noc += 2;
} elseif (240 <= $t && $t <= 247) {
$tn = 4; $n += 4; $noc += 2;
} elseif (248 <= $t && $t <= 251) {
$tn = 5; $n += 5; $noc += 2;
} elseif ($t == 252 || $t == 253) {
$tn = 6; $n += 6; $noc += 2;
} else {
$n++;
}
if($noc >= $length) {
    break;
   }
}
if($noc > $length) {
   $n -= $tn;
}
$strcut = substr($string, 0, $n);
} else {
for($i = 0; $i < $length; $i++) {
$strcut .= ord($string[$i]) > 127 ? $string[$i].$string[++$i] : $string[$i];
}
}
//$strcut = str_replace(array('&', '"', '<', '>'), array('&', '"', '<', '>'), $strcut);