The display of general website pages will inevitably involve the interception of substrings. At this time, truncate comes in handy, but it is only suitable for English users. For Chinese users, using truncate will cause garbled characters, and for Chinese and English For mixed strings, if the same number of strings are intercepted, the actual display lengths will be different, which will appear visually uneven and the image will be beautiful. This is because the length of one Chinese character is roughly equivalent to the length of two English characters. In addition, truncate is not compatible with GB2312, UTF-8 and other encodings at the same time.
Improved smartTruncate: File name: modifier.smartTruncate.php
Copy code The code is as follows:
function smartDetectUTF8($string)
{
static $result = array();
if(! array_key_exists($key = md5($string), $result))
{
$utf8 = "
/^(?:
[x09x0Ax0Dx20-x7E] # ASCII
| [xC2-xDF][x80-xBF ] # non-overlong 2-byte
| x9F][x80-xBF] # excluding surrogates
| 15
| }
return $result[$key];
}
function smartStrlen($string)
{
$result = 0;
$number = smartDetectUTF8($string) ? 3 : 2;
for($i = 0; $ i < strlen($string); $i += $bytes)
{
$bytes = ord(substr($string, $i, 1)) > 127 ? $number : 1;
$result += $ bytes > 1 ? 1.0 : 0.5;
}
return $result;
}
function smartSubstr($string, $start, $length = null)
{
$result = '';
$number = smartDetectUTF8($string ) ? 3 : 2;
if($start < 0)
{
$start = max(smartStrlen($string) + $start, 0);
}
for($i = 0; $i < strlen ($string); $i += $bytes)
{
if($start <= 0)
{
break;
}
$bytes = ord(substr($string, $i, 1)) > 127 ? $number : 1;
$start -= $bytes > 1 ? 1.0 : 0.5;
}
if(is_null($length))
{
$result = substr($string, $i);
}
else
{
for($j = $i; $j < strlen($string); $j += $bytes)
{
if($length <= 0)
{
break;
}
if(($bytes = ord(substr($string, $j, 1)) > 127 ? $number : 1) > 1)
{
if($length < 1.0)
{
break;
}
$result .= substr($string, $j, $bytes);
$length -= 1.0;
}
else
{
$result .= substr($string, $j, 1);
$length - = 0.5;
}
}
}
return $result;
}
function smarty_modifier_smartTruncate($string, $length = 80, $etc = '...',
$break_words = false, $middle = false)
{
if ($length == 0)
return '';
if (smartStrlen($string) > $length) {
$length -= smartStrlen($etc);
if (!$break_words && !$middle) {
$string = preg_replace('/s+?(S+)?$/', '', smartSubstr($string, 0, $length+1));
}
if(!$middle) {
return smartSubstr( $string, 0, $length).$etc;
} else {
return smartSubstr($string, 0, $length/2) . $etc . smartSubstr($string, -$length/2);
}
} else {
return $string;
}
}
?>
The above code fully realizes the original function of truncate, and is compatible with both GB2312 and UTF-8 encoding. When judging the character length, a Chinese character It counts as 1.0, and one English character counts as 0.5, so there will be no unevenness when intercepting substrings.
There is nothing special about how to use the plug-in. Here is a simple test:
{$content|smartTruncate:5: ".."} ($content is equal to "A China B China C People D People E Communist Party F and G Country H")
Display: A China B China C.. (The length of Chinese symbols is counted as 1.0, and the length of English symbols is counted as 0.5, And consider the length of the omitted symbols)
Whether you use GB2312 encoding or UTF-8 encoding, you will find that the results are correct, which is one of the reasons why I added the word smart in the plug-in name.
The above introduces the problem of garbled Chinese characters intercepted by GB2312 php smarty? gb2312/utf-8, including the content of GB2312. I hope it will be helpful to friends who are interested in PHP tutorials.