The easiest way to intercept a string in php is to use the substr() function. However, the substr function can only intercept English. If it is Chinese, it will not be garbled. Then some friends said that you can use mb_substr() to intercept. , this method cannot intercept mixed Chinese and English characters.
This function is used to intercept gb2312 encoded Chinese string:
The code is as follows |
Copy code |
代码如下 |
复制代码 |
// 说明:截取中文字符串
function mysubstr($str, $start, $len) {
$tmpstr = "";
$strlen = $start + $len;
for($i = 0; $i < $strlen; $i++) {
if(ord(substr($str, $i, 1)) > 0xa0) {
$tmpstr .= substr($str, $i, 2);
$i++;
} else
$tmpstr .= substr($str, $i, 1);
}
return $tmpstr;
}
?>
|
// Description: intercept Chinese string
function mysubstr($str, $start, $len) {
$tmpstr = "";
$strlen = $start + $len;
for($i = 0; $i < $strlen; $i++) {
If(ord(substr($str, $i, 1)) > 0xa0) {
$tmpstr .= substr($str, $i, 2);
$i++;
} else
$tmpstr .= substr($str, $i, 1);
}
Return $tmpstr;
}
?>
|
Chinese character interception function supported by Utf-8 and gb2312
Interception utf-8 string function
In order to support multiple languages, the strings in the database may be saved as UTF-8 encoding. During website development, you may need to use PHP to intercept part of the string. In order to avoid garbled characters, write the following UTF-8 string interception function
For the principles of utf-8, please see UTF-8 FAQ
UTF-8 encoded characters may consist of 1~3 bytes, and the specific number can be determined from the first byte. (Theoretically it may be longer, but here we assume no more than 3 bytes)
If the first byte is greater than 224, it and the following 2 bytes form a UTF-8 character
If the first byte is greater than 192 and less than 224, it and the 1 byte after it form a UTF-8 character
Otherwise the first byte itself is an English character (including numbers and a small amount of punctuation).
The code is as follows |
Copy code |
代码如下 |
复制代码 |
// 说明:Utf-8、gb2312都支持的汉字截取函数
/*
Utf-8、gb2312都支持的汉字截取函数
cut_str(字符串, 截取长度, 开始长度, 编码);
编码默认为 utf-8
开始长度默认为 0
*/
function cut_str($string, $sublen, $start = 0, $code = 'UTF-8')
{
if($code == 'UTF-8')
{
$pa = "/[x01-x7f]|[xc2-xdf][x80-xbf]|xe0[xa0-xbf][x80-xbf]|[xe1-xef][x80-xbf][x80-xbf]|xf0[x90-xbf][x80-xbf][x80-xbf]|[xf1-xf7][x80-xbf][x80-xbf][x80-xbf]/";
preg_match_all($pa, $string, $t_string);
if(count($t_string[0]) - $start > $sublen) return join('', array_slice($t_string[0], $start, $sublen))."...";
return join('', array_slice($t_string[0], $start, $sublen));
}
else
{
$start = $start*2;
$sublen = $sublen*2;
$strlen = strlen($string);
$tmpstr = '';
for($i=0; $i<$strlen; $i++)
{
if($i>=$start && $i<($start+$sublen))
{
if(ord(substr($string, $i, 1))>129)
{
$tmpstr.= substr($string, $i, 2);
}
else
{
$tmpstr.= substr($string, $i, 1);
}
}
if(ord(substr($string, $i, 1))>129) $i++;
}
if(strlen($tmpstr)<$strlen ) $tmpstr.= "...";
return $tmpstr;
}
}
$str = "abcd需要截取的字符串";
echo cut_str($str, 8, 0, 'gb2312');
?>
|
// Description: Chinese character interception function supported by Utf-8 and gb2312 <🎜>
<🎜>
/* <🎜>
Chinese character interception function supported by Utf-8 and gb2312 <🎜>
cut_str(string, cut length, starting length, encoding); <🎜>
The encoding defaults to utf-8 <🎜>
Start length defaults to 0 <🎜>
*/<🎜>
<🎜>
function cut_str($string, $sublen, $start = 0, $code = 'UTF-8') <🎜>
{ <🎜>
If($code == 'UTF-8') <🎜>
{ <🎜>
$pa = "/[x01-x7f]|[xc2-xdf][x80-xbf]|xe0[xa0-xbf][x80-xbf]|[xe1-xef][x80-xbf][x80-xbf]| xf0[x90-xbf][x80-xbf][x80-xbf]|[xf1-xf7][x80-xbf][x80-xbf][x80-xbf]/"; <🎜>
Preg_match_all($pa, $string, $t_string); <🎜>
<🎜>
If(count($t_string[0]) - $start > $sublen) return join('', array_slice($t_string[0], $start, $sublen))."...";
return join('', array_slice($t_string[0], $start, $sublen));
}
else
{
$start = $start*2;
$sublen = $sublen*2;
$strlen = strlen($string);
$tmpstr = '';
for($i=0; $i<$strlen; $i++) <🎜>
{ <🎜>
If($i>=$start && $i<($start+$sublen)) <🎜>
If(ord(substr($string, $i, 1))>129)
$tmpstr.= substr($string, $i, 2);
Else
$tmpstr.= substr($string, $i, 1);
If(ord(substr($string, $i, 1))>129) $i++;
}
If(strlen($tmpstr)<$strlen ) $tmpstr.= "..."; <🎜>
return $tmpstr; <🎜>
} <🎜>
} <🎜>
<🎜>
$str = "The string that abcd needs to intercept"; <🎜>
echo cut_str($str, 8, 0, 'gb2312'); <🎜>
?>
|
Note:
The code is as follows
代码如下 |
复制代码 |
function utf8Substr($str, $from, $len)
{
return preg_replace('#^(?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$from.'}'.
'((?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$len.'}).*#s',
'',$str);
}
|
|
Copy code
|
function utf8Substr($str, $from, $len)
{
Return preg_replace('#^(?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$from.'}'.
'((?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$len.'}).*#s',
'$1',$str);
}
Uft8 strings can be intercepted individually.
Program description:
1. The len parameter is based on Chinese characters. 1len is equal to 2 English characters. In order to make the form more beautiful
2. If the magic parameter is set to false, Chinese and English will be treated equally, and the absolute number of characters will be taken
代码如下 |
复制代码 |
function FSubstr($title,$start,$len="",$magic=true)
{
/**
* powered by Smartpig
* mailto:d.einstein@263.net
*/
$length = 0;
if($len == "") $len = strlen($title);
//判断起始为不正确位置
if($start > 0)
{
$cnum = 0;
for($i=0;$i<$start;$i++)
{
if(ord(substr($title,$i,1)) >= 128) $cnum ++;
}
if($cnum%2 != 0) $start--;
unset($cnum);
}
if(strlen($title)<=$len) return substr($title,$start,$len);
$alen = 0;
$blen = 0;
$realnum = 0;
for($i=$start;$i
{
$ctype = 0;
$cstep = 0;
$cur = substr($title,$i,1);
if($cur == "&")
{
if(substr($title,$i,4) == "<")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,4) == ">")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,5) == "&")
{
$cstep = 5;
$length += 5;
$i += 4;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,6) == """)
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(substr($title,$i,6) == "'")
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum ++;
if($magic)
{
$alen ++;
}
}
else if(preg_match("/(d+);/i",substr($title,$i,8),$match))
{
$cstep = strlen($match[0]);
$length += strlen($match[0]);
$i += strlen($match[0])-1;
$realnum ++;
if($magic)
{
$blen ++;
$ctype = 1;
}
}
}else{
if(ord($cur)>=128)
{
$cstep = 2;
$length += 2;
$i += 1;
$realnum ++;
if($magic)
{
$blen ++;
$ctype = 1;
}
}else{
$cstep = 1;
$length +=1;
$realnum ++;
if($magic)
{
$alen++;
}
}
}
if($magic)
{
if(($blen*2+$alen) == ($len*2)) break;
if(($blen*2+$alen) == ($len*2+1))
{
if($ctype == 1)
{
$length -= $cstep;
break;
}else{
break;
}
}
}else{
if($realnum == $len) break;
}
}
unset($cur);
unset($alen);
unset($blen);
unset($realnum);
unset($ctype);
unset($cstep);
return substr($title,$start,$length);
}
|
| 3. Especially suitable for strings encoded with htmlspecialchars()
4. Can correctly handle the entity character mode () in GB2312
Program code:
The code is as follows
|
Copy code
|
function FSubstr($title,$start,$len="",$magic=true)
{
/**
* powered by Smartpig
* mailto:d.einstein@263.net */
$length = 0;
if($len == "") $len = strlen($title);
//Judge the starting position to the incorrect position
if($start > 0)
{
$cnum = 0;
for($i=0;$i<$start;$i++)
{
if(ord(substr($title,$i,1)) >= 128) $cnum ++;
}
if($cnum%2 != 0) $start--;
unset($cnum);
}
if(strlen($title)<=$len) return substr($title,$start,$len);<🎜>
<🎜>$alen = 0;
$blen = 0;<🎜>
<🎜>$realnum = 0;<🎜>
<🎜>for($i=$start;$i
{
$ctype = 0;
$cstep = 0;
$cur = substr($title,$i,1);
if($cur == "&")
{
if(substr($title,$i,4) == "<")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum++;
if($magic)
{
$alen++;
}
}
else if(substr($title,$i,4) == ">")
{
$cstep = 4;
$length += 4;
$i += 3;
$realnum++;
if($magic)
{
$alen++;
}
}
else if(substr($title,$i,5) == "&")
{
$cstep = 5;
$length += 5;
$i += 4;
$realnum++;
if($magic)
{
$alen++;
}
}
else if(substr($title,$i,6) == """)
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum++;
if($magic)
{
$alen++;
}
}
else if(substr($title,$i,6) == "'")
{
$cstep = 6;
$length += 6;
$i += 5;
$realnum++;
if($magic)
{
$alen++;
}
}
else if(preg_match("/(d+);/i",substr($title,$i,8),$match))
{
$cstep = strlen($match[0]);
$length += strlen($match[0]);
$i += strlen($match[0])-1;
$realnum++;
if($magic)
{
$blen++;
$ctype = 1;
}
}
}else{
if(ord($cur)>=128)
{
$cstep = 2;
$length += 2;
$i += 1;
$realnum++;
if($magic)
{
$blen++;
$ctype = 1;
}
}else{
$cstep = 1;
$length +=1;
$realnum++;
if($magic)
{
$alen++;
}
}
}
if($magic)
{
if(($blen*2+$alen) == ($len*2)) break;
if(($blen*2+$alen) == ($len*2+1))
{
if($ctype == 1)
{
$length -= $cstep;
break;
}else{
break;
}
}
}else{
if($realnum == $len) break;
}
}
unset($cur);
unset($alen);
unset($blen);
unset($realnum);
unset($ctype);
unset($cstep);
return substr($title,$start,$length);
}
http://www.bkjia.com/PHPjc/633089.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/633089.htmlTechArticleThe easiest way to intercept strings in php is to use the substr() function, but the substr function can only Intercept English, if it is Chinese it will not be garbled, then a friend said it can be used...
|