Use a variety of methods to achieve perfect interception of Chinese strings. It supports UTF-8, GBK, GB2312, and BIG5 encoding without installing the mbstring and iconv extensions. After installing the above extensions, more encodings are supported. For details, refer to the function illustrate.
There are three methods
1. mb_substr() requires mbstring extension
2. iconv_substr() requires iconv extension
3. Regular matching, supported by default
Three methods are prioritized from top to bottom. If the previous method is not available, it will be automatically Use the next method.
This code is optimized from the "String Interception, Support Common Encoding" code released by Midnight
1. Repair the original code that does not return mb_substr and iconv_substr, so it is equivalent to an invalid call
2. Optimize the interception of string suffix , the suffix can be customized. Default is empty.
<?php /** * 字符串截取,支持中文和其他编码 * * @param string $str 需要转换的字符串 * @param string $start 开始位置 * @param string $length 截取长度 * @param string $charset 编码格式 * @param string $suffix 截断字符串后缀 * @return string */ function substr_ext($str, $start=0, $length, $charset="utf-8", $suffix="") { if(function_exists("mb_substr")){ return mb_substr($str, $start, $length, $charset).$suffix; } elseif(function_exists('iconv_substr')){ return iconv_substr($str,$start,$length,$charset).$suffix; } $re['utf-8'] = "/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xff][\x80-\xbf]{3}/"; $re['gb2312'] = "/[\x01-\x7f]|[\xb0-\xf7][\xa0-\xfe]/"; $re['gbk'] = "/[\x01-\x7f]|[\x81-\xfe][\x40-\xfe]/"; $re['big5'] = "/[\x01-\x7f]|[\x81-\xfe]([\x40-\x7e]|\xa1-\xfe])/"; preg_match_all($re[$charset], $str, $match); $slice = join("",array_slice($match[0], $start, $length)); return $slice.$suffix; }