Home > Backend Development > PHP Tutorial > Two examples of how to get the first letters of Chinese characters in PHP_PHP Tutorial

Two examples of how to get the first letters of Chinese characters in PHP_PHP Tutorial

WBOY
Release: 2016-07-13 10:48:13
Original
1019 people have browsed it

There are many ways to get the first letter of a Chinese character. For this, we usually separate the Chinese characters and convert them into pinyin, and then use substr to get the first letter. Below I found two examples online, each with its own merits. Let’s take a look. Take a look.

Example 1

The main functions are: clear functions, easy to modify, maintain and expand; English string: return unchanged (including numbers); Chinese string: return the first character of Pinyin; Chinese-English mixed string: return the first character of Pinyin and English. This algorithm uses binary search to fix the previous error where the letter Z was read as Y. Good things need to be collected, so I leave a mark here for future generations to verify!

The code is as follows Copy code

/**
* Fix binary search method
* Chinese Pinyin Initial Tools
* Note: English string: returned unchanged (including numbers) eg .abc123 => abc123
* Chinese string: Return the first character of Pinyin eg. Test string => CSZFC
* Mixed Chinese and English strings: Return the first character of Pinyin and English eg. IiIj => WIWJ
* eg.
* $py = new str2PY();
+*/
class str2PY
{
Private $_pinyins = array(
176161 => 'A',
176197 => 'B',
178193 => 'C',
180238 => 'D',
182234 => 'E',
183162 => 'F',
184193 => 'G',
185254 => 'H',
187247 => 'J',
191166 => 'K',
192172 => 'L',
194232 => 'M',
196195 => 'N',
197182 => 'O',
197190 => 'P',
198218 => 'Q',
200187 => 'R',
200246 => 'S',
203250 => 'T',
205218 => 'W',
206244 => 'X',
209185 => 'Y',
212209 => 'Z',
);
Private $_charset = null;
/**
* Constructor, specify the required encoding default: utf-8
* Support utf-8, gb2312
*
* @param unknown_type $charset
​​*/
Public function __construct( $charset = 'utf-8' )
{
$this->_charset = $charset;
}
/**
* Chinese string substr
*
* @param string $str
* @param int $start
* @param int $len
* @return string
​​*/
Private function _msubstr ($str, $start, $len)
{
$start = $start * 2;
$len = $len * 2;
           $strlen = strlen($str);
          $result = '';
for ( $i = 0; $i < $strlen; $i++ ) {
If ( $i >= $start && $i < ($start + $len) ) {
If ( ord(substr($str, $i, 1)) > 129 ) $result .= substr($str, $i, 2);
                       else $result .= substr($str, $i, 1);
            }
If ( ord(substr($str, $i, 1)) > 129 ) $i++;
        }
        return $result;
    }
    /**
* The string is divided into arrays (Chinese characters or one character as unit)
*
* @param string $str
* @return array
​​*/
    private function _cutWord( $str )
    {
        $words = array();
         while ( $str != "" )
         {
            if ( $this->_isAscii($str) ) {/*非中文*/
                $words[] = $str[0];
                $str = substr( $str, strlen($str[0]) );
            }else{
                $word = $this->_msubstr( $str, 0, 1 );
                $words[] = $word;
                $str = substr( $str, strlen($word) );
            }
         }
         return $words;
    }
    /**
* Determine whether the character is an ascii character
*
* @param string $char
* @return bool
​​*/
    private function _isAscii( $char )
    {
        return ( ord( substr($char,0,1) ) < 160 );
    }
    /**
* Determine whether the first 3 characters of the string are ascii characters
*
* @param string $str
* @return bool
​​*/
    private function _isAsciis( $str )
    {
        $len = strlen($str) >= 3 ? 3: 2;
        $chars = array();
        for( $i = 1; $i < $len -1; $i++ ){
            $chars[] = $this->_isAscii( $str[$i] ) ? 'yes':'no';
        }
        $result = array_count_values( $chars );
        if ( empty($result['no']) ){
            return true;
        }
        return false;
    }
    /**
* Get the first pinyin character of a Chinese string
*
* @param string $str
* @return string
​​*/
    public function getInitials( $str )
    {
        if ( empty($str) ) return '';
        if ( $this->_isAscii($str[0]) && $this->_isAsciis( $str )){
            return $str;
        }
        $result = array();
        if ( $this->_charset == 'utf-8' ){
            $str = iconv( 'utf-8', 'gb2312', $str );
        }
        $words = $this->_cutWord( $str );
        foreach ( $words as $word )
        {
            if ( $this->_isAscii($word) ) {/*非中文*/
                $result[] = $word;
                continue;
            }
            $code = ord( substr($word,0,1) ) * 1000 + ord( substr($word,1,1) );
            /*获取拼音首字母A--Z*/
            if ( ($i = $this->_search($code)) != -1 ){
                $result[] = $this->_pinyins[$i];
            }
        }
        return strtoupper(implode('',$result));
    }
    private function _getChar( $ascii )
    {
        if ( $ascii >= 48 && $ascii <= 57){
            return chr($ascii);  /*数字*/
        }elseif ( $ascii>=65 && $ascii<=90 ){
            return chr($ascii);   /* A--Z*/
        }elseif ($ascii>=97 && $ascii<=122){
            return chr($ascii-32); /* a--z*/
        }else{
            return '-'; /*其他*/
        }
    }

    /**
* Find the pinyin characters (dichotomy) corresponding to the required Chinese character internal code (gb2312)
*
* @param int $code
* @return int
​​*/
    private function _search( $code )
    {
        $data = array_keys($this->_pinyins);
        $lower = 0;
        $upper = sizeof($data)-1;
  $middle = (int) round(($lower + $upper) / 2);
        if ( $code < $data[0] ) return -1;
        for (;;) {
            if ( $lower > $upper ){
                return $data[$lower-1];
            }
            $tmp = (int) round(($lower + $upper) / 2);
            if ( !isset($data[$tmp]) ){
    return $data[$middle];
            }else{
    $middle = $tmp;
   }
            if ( $data[$middle] < $code ){
                $lower = (int)$middle + 1;
            }else if ( $data[$middle] == $code ) {
                return $data[$middle];
            }else{
                $upper = (int)$middle - 1;
            }
        }
    }
}
?>

Example 2

Get the asc range of Chinese characters and return the first letter of the Chinese characters.

 代码如下 复制代码

function getfirstchar($s0){
$fchar = ord($s0{0});
if($fchar >= ord("A") and $fchar <= ord("z") )return strtoupper($s0{0});
$s1 = iconv("UTF-8","gb2312", $s0);
$s2 = iconv("gb2312","UTF-8", $s1);
if($s2 == $s0){$s = $s1;}else{$s = $s0;}
$asc = ord($s{0}) * 256 + ord($s{1}) - 65536;
if($asc >= -20319 and $asc <= -20284) return "A";
if($asc >= -20283 and $asc <= -19776) return "B";
if($asc >= -19775 and $asc <= -19219) return "C";
if($asc >= -19218 and $asc <= -18711) return "D";
if($asc >= -18710 and $asc <= -18527) return "E";
if($asc >= -18526 and $asc <= -18240) return "F";
if($asc >= -18239 and $asc <= -17923) return "G";
if($asc >= -17922 and $asc <= -17418) return "I";
if($asc >= -17417 and $asc <= -16475) return "J";
if($asc >= -16474 and $asc <= -16213) return "K";
if($asc >= -16212 and $asc <= -15641) return "L";
if($asc >= -15640 and $asc <= -15166) return "M";
if($asc >= -15165 and $asc <= -14923) return "N";
if($asc >= -14922 and $asc <= -14915) return "O";
if($asc >= -14914 and $asc <= -14631) return "P";
if($asc >= -14630 and $asc <= -14150) return "Q";
if($asc >= -14149 and $asc <= -14091) return "R";
if($asc >= -14090 and $asc <= -13319) return "S";
if($asc >= -13318 and $asc <= -12839) return "T";
if($asc >= -12838 and $asc <= -12557) return "W";
if($asc >= -12556 and $asc <= -11848) return "X";
if($asc >= -11847 and $asc <= -11056) return "Y";
if($asc >= -11055 and $asc <= -10247) return "Z";
return null;
}


function pinyin1($zh){
$ret = "";
$s1 = iconv("UTF-8","gb2312", $zh);
$s2 = iconv("gb2312","UTF-8", $s1);
if($s2 == $zh){$zh = $s1;}
for($i = 0; $i < strlen($zh); $i++){
$s1 = substr($zh,$i,1);
$p = ord($s1);
if($p > 160){
            $s2 = substr($zh,$i++,2);
            $ret .= getfirstchar($s2);
        }else{
            $ret .= $s1;
        }
    }
    return $ret;
}
echo "这是中文字符串
";
echo pinyin1('这是中文字符串');
 
?>

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/632803.htmlTechArticleThere are many ways to get the first letters of Chinese characters. For this, we usually need to separate the Chinese characters and convert them into pinyin, and then Then use substr to get the first letter. Below I found two examples online...
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template