Home > Backend Development > PHP Tutorial > PHP function to get the first letter of Chinese characters_PHP tutorial

PHP function to get the first letter of Chinese characters_PHP tutorial

WBOY
Release: 2016-07-13 10:25:28
Original
1111 people have browsed it

There are many methods on the Internet, all with the same principle. According to the needs, I made a version of the class file. The main functions are: clear functions, easy to modify, maintain and expand; English strings: return unchanged ( Including numbers); Chinese string: returns the first character of Pinyin; Chinese-English mixed string: returns the first character of Pinyin and English. This algorithm uses binary search to fix the previous error where the letter Z was read as Y. Good things need to be collected, so I leave a mark here for future generations to verify!

Copy code The code is as follows:

 /**
* Modified by http://iulog.com @ 2013-05-07
* Fix binary search method
* Chinese pinyin initial letter tool class
* Note: English string: No Variable returns (including numbers) eg .abc123 => abc123
* Chinese string: Returns the first character of Pinyin eg. Test string => CSZFC
* Chinese-English mixed string: Returns the first character of Pinyin and English eg . Ii我j => WIWJ
* eg.
* $py = new str2PY();
* $result = $py->getInitials('Ah, just be hungry and fly Did I just watch it? You oh, it’s his uv. I want to be there');
*/
class str2PY
{
    private $_pinyins = array(
        176161 => 'A',
        176197 => 'B',
        178193 => 'C',
        180238 => 'D',
        182234 => 'E',
        183162 => 'F',
        184193 => 'G',
        185254 => 'H',
        187247 => 'J',
        191166 => 'K',
        192172 => 'L',
        194232 => 'M',
        196195 => 'N',
        197182 => 'O',
        197190 => 'P',
        198218 => 'Q',
        200187 => 'R',
        200246 => 'S',
        203250 => 'T',
        205218 => 'W',
        206244 => 'X',
        209185 => 'Y',
        212209 => 'Z',
    );
    private $_charset = null;
    /**
* Constructor, specify the required encoding default: utf-8
* Support utf-8, gb2312
*
* @param unknown_type $charset
*/
    public function __construct( $charset = 'utf-8' )
    {
        $this->_charset    = $charset;
    }
    /**
* Chinese string substr
*
* @param string $str
* @param int $start
* @param int $len
* @return string
*/
    private function _msubstr ($str, $start, $len)
    {
        $start  = $start * 2;
        $len    = $len * 2;
        $strlen = strlen($str);
        $result = '';
        for ( $i = 0; $i < $strlen; $i++ ) {
            if ( $i >= $start && $i < ($start + $len) ) {
                if ( ord(substr($str, $i, 1)) > 129 ) $result .= substr($str, $i, 2);
                else $result .= substr($str, $i, 1);
            }
            if ( ord(substr($str, $i, 1)) > 129 ) $i++;
        }
        return $result;
    }
    /**
* The string is divided into arrays (Chinese characters or one character as units)
*
* @param string $str
* @return array
*/
    private function _cutWord( $str )
    {
        $words = array();
         while ( $str != "" )
         {
            if ( $this->_isAscii($str) ) {/*非中文*/
                $words[] = $str[0];
                $str = substr( $str, strlen($str[0]) );
            }else{
                $word = $this->_msubstr( $str, 0, 1 );
                $words[] = $word;
                $str = substr( $str, strlen($word) );
            }
         }
         return $words;
    }
    /**
* Determine whether the character is an ascii character
*
* @param string $char
* @return bool
*/
    private function _isAscii( $char )
    {
        return ( ord( substr($char,0,1) ) < 160 );
    }
    /**
* Determine whether the first 3 characters of the string are ascii characters
*
* @param string $str
* @return bool
*/
    private function _isAsciis( $str )
    {
        $len = strlen($str) >= 3 ? 3: 2;
        $chars = array();
        for( $i = 1; $i < $len -1; $i++ ){
            $chars[] = $this->_isAscii( $str[$i] ) ? 'yes':'no';
        }
        $result = array_count_values( $chars );
        if ( empty($result['no']) ){
            return true;
        }
        return false;
    }
    /**
* Get the first pinyin character of the Chinese string
*
* @param string $str
* @return string
*/
    public function getInitials( $str )
    {
        if ( empty($str) ) return '';
        if ( $this->_isAscii($str[0]) && $this->_isAsciis( $str )){
            return $str;
        }
        $result = array();
        if ( $this->_charset == 'utf-8' ){
            $str = iconv( 'utf-8', 'gb2312', $str );
        }
        $words = $this->_cutWord( $str );
        foreach ( $words as $word )
        {
            if ( $this->_isAscii($word) ) {/*非中文*/
                $result[] = $word;
                continue;
            }
            $code = ord( substr($word,0,1) ) * 1000 + ord( substr($word,1,1) );
            /*获取拼音首字母A--Z*/
            if ( ($i = $this->_search($code)) != -1 ){
                $result[] = $this->_pinyins[$i];
            }
        }
        return strtoupper(implode('',$result));
    }
    private function _getChar( $ascii )
    {
        if ( $ascii >= 48 && $ascii <= 57){
            return chr($ascii);  /*数字*/
        }elseif ( $ascii>=65 && $ascii<=90 ){
            return chr($ascii);   /* A--Z*/
        }elseif ($ascii>=97 && $ascii<=122){
            return chr($ascii-32); /* a--z*/
        }else{
            return '-'; /*其他*/
        }
    }

    /**
* Find the pinyin characters (dichotomy) corresponding to the required Chinese character internal code (gb2312)
*
* @param int $code
* @return int
*/
    private function _search( $code )
    {
        $data = array_keys($this->_pinyins);
        $lower = 0;
        $upper = sizeof($data)-1;
  $middle = (int) round(($lower + $upper) / 2);
        if ( $code < $data[0] ) return -1;
        for (;;) {
            if ( $lower > $upper ){
                return $data[$lower-1];
            }
            $tmp = (int) round(($lower + $upper) / 2);
            if ( !isset($data[$tmp]) ){
    return $data[$middle];
            }else{
    $middle = $tmp;
   }
            if ( $data[$middle] < $code ){
                $lower = (int)$middle + 1;
            }else if ( $data[$middle] == $code ) {
                return $data[$middle];
            }else{
                $upper = (int)$middle - 1;
            }
        }
    }
}
?>

www.bkjia.comtruehttp://www.bkjia.com/PHPjc/825075.htmlTechArticle网上的方法有不少,都是一样的原理,按照需求,做了一下版本的class类文件,主要功能是:功能明确,易于修改维护和扩展; 英文的字串...
Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template