Home Backend Development PHP Tutorial PHP Chinese character to Pinyin source code (GB2312 class library, supports about 6000 Chinese characters)

PHP Chinese character to Pinyin source code (GB2312 class library, supports about 6000 Chinese characters)

Jul 23, 2016 am 08:55 AM

In a recent project, PHP was used as the development language to provide data for the front end. In order to be able to sort by letters, it was necessary to extract the Chinese Pinyin. I used this project to write a script to convert Chinese characters to Pinyin. The script is relatively simple and the comments are relatively simple. I won’t go into details here, let’s go directly to the code.

Usage:

  1. $py = new PinYin();
  2. $all_py = $py->get_all_py("武国伟"); //Output ['wu','guo' ,'wei'], output the string and call the join method, join('',$all_py)
  3. $first_py = $py->get_first_py($all_py);//Output wgw
  4. $first_letter = $py->get_first_letter ($all_py);//Output w
Copy code


Source code:

  1. /**
  2. * +-------------------------------------------------- -------
  3. * PHP convert Chinese characters to Pinyin
  4. * +---------------------------------- --------------------
  5. * Usage:
  6. * $py = new PinYin();
  7. * $all_py = $py->get_all_py("武国伟" ); //Output ['wu','guo','wei'], output the string and call the join method, join('',$all_py)
  8. * $first_py = $py->get_first_py($all_py); //Output wgw
  9. * $first_letter = $py->get_first_letter($all_py);//Output w
  10. *
  11. * +--------------------- ----------------------------------
  12. */
  13. class PinYin
  14. {
  15. private $dict_list = array(
  16. 'a' => -20319, 'ai' => -20317, 'an' => -20304, 'ang' => -20295, 'ao' => -20292,
  17. 'ba' => -20283, 'bai' => -20265, 'ban' => -20257, 'bang' => -20242, 'bao' => -20230, 'bei' => -20051, 'ben' => -20036, 'beng' => -20032, 'bi' => -20026, 'bian' => -20002, 'biao' => -19990, 'bie' => -19986, 'bin' => -19982, 'bing' => -19976, 'bo' => -19805, 'bu' => -19784,
  18. 'ca' => -19775, 'cai' => -19774, 'can' => -19763, 'cang' => -19756, 'cao' => -19751, 'ce' => -19746, 'ceng' => -19741, 'cha' => -19739, 'chai' => -19728, 'chan' => -19725, 'chang' => -19715, 'chao' => -19540, 'che' => -19531, 'chen' => -19525, 'cheng' => -19515, 'chi' => -19500, 'chong' => -19484, 'chou' => -19479, 'chu' => -19467, 'chuai' => -19289, 'chuan' => -19288, 'chuang' => -19281, 'chui' => -19275, 'chun' => -19270, 'chuo' => -19263, 'ci' => -19261, 'cong' => -19249, 'cou' => -19243, 'cu' => -19242, 'cuan' => -19238, 'cui' => -19235, 'cun' => -19227, 'cuo' => -19224,
  19. 'da' => -19218, 'dai' => -19212, 'dan' => -19038, 'dang' => -19023, 'dao' => -19018, 'de' => -19006, 'deng' => -19003, 'di' => -18996, 'dian' => -18977, 'diao' => -18961, 'die' => -18952, 'ding' => -18783, 'diu' => -18774, 'dong' => -18773, 'dou' => -18763, 'du' => -18756, 'duan' => -18741, 'dui' => -18735, 'dun' => -18731, 'duo' => -18722,
  20. 'e' => -18710, 'en' => -18697, 'er' => -18696,
  21. 'fa' => -18526, 'fan' => -18518, 'fang' => -18501, 'fei' => -18490, 'fen' => -18478, 'feng' => -18463, 'fo' => -18448, 'fou' => -18447, 'fu' => -18446,
  22. 'ga' => -18239, 'gai' => -18237, 'gan' => -18231, 'gang' => -18220, 'gao' => -18211, 'ge' => -18201, 'gei' => -18184, 'gen' => -18183, 'geng' => -18181, 'gong' => -18012, 'gou' => -17997, 'gu' => -17988, 'gua' => -17970, 'guai' => -17964, 'guan' => -17961, 'guang' => -17950, 'gui' => -17947,
  23. 'gun' => -17931, 'guo' => -17928,
  24. 'ha' => -17922, 'hai' => -17759, 'han' => -17752, 'hang' => -17733, 'hao' => -17730, 'he' => -17721, 'hei' => -17703, 'hen' => -17701, 'heng' => -17697, 'hong' => -17692, 'hou' => -17683, 'hu' => -17676, 'hua' => -17496, 'huai' => -17487, 'huan' => -17482, 'huang' => -17468, 'hui' => -17454,
  25. 'hun' => -17433, 'huo' => -17427,
  26. 'ji' => -17417, 'jia' => -17202, 'jian' => -17185, 'jiang' => -16983, 'jiao' => -16970, 'jie' => -16942, 'jin' => -16915, 'jing' => -16733, 'jiong' => -16708, 'jiu' => -16706, 'ju' => -16689, 'juan' => -16664, 'jue' => -16657, 'jun' => -16647,
  27. 'ka' => -16474, 'kai' => -16470, 'kan' => -16465, 'kang' => -16459, 'kao' => -16452, 'ke' => -16448, 'ken' => -16433, 'keng' => -16429, 'kong' => -16427, 'kou' => -16423, 'ku' => -16419, 'kua' => -16412, 'kuai' => -16407, 'kuan' => -16403, 'kuang' => -16401, 'kui' => -16393, 'kun' => -16220, 'kuo' => -16216,
  28. 'la' => -16212, 'lai' => -16205, 'lan' => -16202, 'lang' => -16187, 'lao' => -16180, 'le' => -16171, 'lei' => -16169, 'leng' => -16158, 'li' => -16155, 'lia' => -15959, 'lian' => -15958, 'liang' => -15944, 'liao' => -15933, 'lie' => -15920, 'lin' => -15915, 'ling' => -15903, 'liu' => -15889,
  29. 'long' => -15878, 'lou' => -15707, 'lu' => -15701, 'lv' => -15681, 'luan' => -15667, 'lue' => -15661, 'lun' => -15659, 'luo' => -15652,
  30. 'ma' => -15640, 'mai' => -15631, 'man' => -15625, 'mang' => -15454, 'mao' => -15448, 'me' => -15436, 'mei' => -15435, 'men' => -15419, 'meng' => -15416, 'mi' => -15408, 'mian' => -15394, 'miao' => -15385, 'mie' => -15377, 'min' => -15375, 'ming' => -15369, 'miu' => -15363, 'mo' => -15362, 'mou' => -15183, 'mu' => -15180,
  31. 'na' => -15165, 'nai' => -15158, 'nan' => -15153, 'nang' => -15150, 'nao' => -15149, 'ne' => -15144, 'nei' => -15143, 'nen' => -15141, 'neng' => -15140, 'ni' => -15139, 'nian' => -15128, 'niang' => -15121, 'niao' => -15119, 'nie' => -15117, 'nin' => -15110, 'ning' => -15109, 'niu' => -14941,
  32. 'nong' => -14937, 'nu' => -14933, 'nv' => -14930, 'nuan' => -14929, 'nue' => -14928, 'nuo' => -14926,
  33. 'o' => -14922, 'ou' => -14921,
  34. 'pa' => -14914, 'pai' => -14908, 'pan' => -14902, 'pang' => -14894, 'pao' => -14889, 'pei' => -14882, 'pen' => -14873, 'peng' => -14871, 'pi' => -14857, 'pian' => -14678, 'piao' => -14674, 'pie' => -14670, 'pin' => -14668, 'ping' => -14663, 'po' => -14654, 'pu' => -14645,
  35. 'qi' => -14630, 'qia' => -14594, 'qian' => -14429, 'qiang' => -14407, 'qiao' => -14399, 'qie' => -14384, 'qin' => -14379, 'qing' => -14368, 'qiong' => -14355, 'qiu' => -14353, 'qu' => -14345, 'quan' => -14170, 'que' => -14159, 'qun' => -14151,
  36. 'ran' => -14149, 'rang' => -14145, 'rao' => -14140, 're' => -14137, 'ren' => -14135, 'reng' => -14125, 'ri' => -14123, 'rong' => -14122, 'rou' => -14112, 'ru' => -14109, 'ruan' => -14099, 'rui' => -14097, 'run' => -14094, 'ruo' => -14092,
  37. 'sa' => -14090, 'sai' => -14087, 'san' => -14083, 'sang' => -13917, 'sao' => -13914, 'se' => -13910, 'sen' => -13907, 'seng' => -13906, 'sha' => -13905, 'shai' => -13896, 'shan' => -13894, 'shang' => -13878, 'shao' => -13870, 'she' => -13859, 'shen' => -13847, 'sheng' => -13831, 'shi' => -13658, 'shou' => -13611, 'shu' => -13601, 'shua' => -13406, 'shuai' => -13404, 'shuan' => -13400, 'shuang' => -13398, 'shui' => -13395, 'shun' => -13391, 'shuo' => -13387, 'si' => -13383, 'song' => -13367, 'sou' => -13359, 'su' => -13356, 'suan' => -13343, 'sui' => -13340, 'sun' => -13329, 'suo' => -13326,
  38. 'ta' => -13318, 'tai' => -13147, 'tan' => -13138, 'tang' => -13120, 'tao' => -13107, 'te' => -13096, 'teng' => -13095, 'ti' => -13091, 'tian' => -13076, 'tiao' => -13068, 'tie' => -13063, 'ting' => -13060, 'tong' => -12888, 'tou' => -12875, 'tu' => -12871, 'tuan' => -12860, 'tui' => -12858, 'tun' => -12852, 'tuo' => -12849,
  39. 'wa' => -12838, 'wai' => -12831, 'wan' => -12829, 'wang' => -12812, 'wei' => -12802, 'wen' => -12607, 'weng' => -12597, 'wo' => -12594, 'wu' => -12585,
  40. 'xi' => -12556, 'xia' => -12359, 'xian' => -12346, 'xiang' => -12320, 'xiao' => -12300, 'xie' => -12120, 'xin' => -12099, 'xing' => -12089, 'xiong' => -12074, 'xiu' => -12067, 'xu' => -12058, 'xuan' => -12039, 'xue' => -11867, 'xun' => -11861,
  41. 'ya' => -11847, 'yan' => -11831, 'yang' => -11798, 'yao' => -11781, 'ye' => -11604, 'yi' => ; -11589, 'yin' => -11536, 'ying' => -11358, 'yo' => -11340, 'yong' => -11339, 'you' => -11324, ' yu' => -11303, 'yuan' => -11097, 'yue' => -11077, 'yun' => -11067,
  42. 'za' => -11055, 'zai' => ; -11052, 'zan' => -11045, 'zang' => -11041, 'zao' => -11038, 'ze' => -11024, 'zei' => -11020, ' zen' => -11019, 'zeng' => -11018, 'zha' => -11014, 'zhai' => -10838, 'zhan' => -10832, 'zhang' => -10815, 'zhao' => -10800, 'zhe' => -10790, 'zhen' => -10780, 'zheng' => -10764, 'zhi' => -10587, 'zhong ' => -10544, 'zhou' => -10533, 'zhu' => -10519, 'zhua' => -10331, 'zhuai' => -10329, 'zhuan' => - 10328, 'zhuang' => -10322, 'zhui' => -10315, 'zhun' => -10309, 'zhuo' => -10307, ​​'zi' => -10296, 'zong' => -10281, 'zou' => -10274, 'zu' => -10270, 'zuan' => -10262,
  43. 'zui' => -10260, 'zun' => - 10256, 'zuo' => -10254
  44. );
  45. /**
  46. * Get all pinyin and return the array of pinyin, such as 'Zhang Sanfeng' ==> ['zhang','san','feng']
  47. * @param $chinese
  48. * @param string $charset
  49. * @return array
  50. */
  51. public function get_all_py($chinese, $charset = 'utf-8')
  52. {
  53. if ($charset != 'gb2312') $chinese = $this->_U2_Utf8_Gb($chinese);
  54. $py = $this->zh_to_pys($chinese);
  55. return $py;
  56. }
  57. /**
  58. * Get the first letter of Pinyin, such as ['zhang','san','feng'] ==> zsf
  59. * @param $all_pys
  60. * @return string
  61. * /
  62. public function get_first_py($all_pys)
  63. {
  64. if (count($all_pys) <= 0) {
  65. return '';
  66. }
  67. $result = [];
  68. foreach ($all_pys as $one) {
  69. if (is_null($one) || strlen($one) <= 0) {
  70. continue;
  71. }
  72. $result[] = substr($one, 0, 1);
  73. }
  74. return join(' ', $result);
  75. }
  76. /**
  77. * Get the first letter of Pinyin, such as ['zhang','san','feng'] ==> z
  78. * @param $all_pys
  79. * @return string
  80. */
  81. public function get_first_letter($all_pys)
  82. {
  83. if (count($all_pys) <= 0) {
  84. return '';
  85. }
  86. foreach ($all_pys as $one) {
  87. if (is_null($one) || strlen($one) <= 0) {
  88. continue;
  89. }
  90. return substr($one, 0, 1);
  91. }
  92. return '';
  93. }
  94. private function _U2_Utf8_Gb($_C)
  95. {
  96. $_String = '';
  97. if ($_C < 0x80) $_String .= $_C;
  98. elseif ($_C < 0x800 ) {
  99. $_String .= chr(0xC0 | $_C >> 6);
  100. $_String .= chr(0x80 | $_C & 0x3F);
  101. } elseif ($_C < 0x10000) {
  102. $_String . = chr(0xE0 | $_C >> 12);
  103. $_String .= chr(0x80 | $_C >> 6 & 0x3F);
  104. $_String .= chr(0x80 | $_C & 0x3F);
  105. } elseif ($_C < 0x200000) {
  106. $_String .= chr(0xF0 | $_C >> 18);
  107. $_String .= chr(0x80 | $_C >> 12 & 0x3F);
  108. $ _String .= chr(0x80 | $_C >> 6 & 0x3F);
  109. $_String .= chr(0x80 | $_C & 0x3F);
  110. }
  111. return iconv('UTF-8', 'GB2312', $ _String);
  112. }
  113. private function zh_to_py($num, $blank = '')
  114. {
  115. if ($num > 0 && $num < 160) {
  116. return chr($num);
  117. } elseif ( $num < -20319 || $num > -10247) {
  118. return $blank;
  119. } else {
  120. foreach ($this->dict_list as $py => $code) {
  121. if ($code > ; $num) break;
  122. $result = $py;
  123. }
  124. return $result;
  125. }
  126. }
  127. private function zh_to_pys($chinese)
  128. {
  129. $result = array();
  130. for ($i = 0 ; $i < strlen($chinese); $i++) {
  131. $p = ord(substr($chinese, $i, 1));
  132. if ($p > 160) {
  133. $q = ord(substr ($chinese, ++$i, 1));
  134. $p = $p * 256 + $q - 65536;
  135. }
  136. $result[] = $this->zh_to_py($p);
  137. }
  138. return $result;
  139. }
  140. }
Copy code


Class library, PHP


Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Working with Flash Session Data in Laravel Working with Flash Session Data in Laravel Mar 12, 2025 pm 05:08 PM

Laravel simplifies handling temporary session data using its intuitive flash methods. This is perfect for displaying brief messages, alerts, or notifications within your application. Data persists only for the subsequent request by default: $request-

cURL in PHP: How to Use the PHP cURL Extension in REST APIs cURL in PHP: How to Use the PHP cURL Extension in REST APIs Mar 14, 2025 am 11:42 AM

The PHP Client URL (cURL) extension is a powerful tool for developers, enabling seamless interaction with remote servers and REST APIs. By leveraging libcurl, a well-respected multi-protocol file transfer library, PHP cURL facilitates efficient execution of various network protocols, including HTTP, HTTPS, and FTP. This extension offers granular control over HTTP requests, supports multiple concurrent operations, and provides built-in security features.

Simplified HTTP Response Mocking in Laravel Tests Simplified HTTP Response Mocking in Laravel Tests Mar 12, 2025 pm 05:09 PM

Laravel provides concise HTTP response simulation syntax, simplifying HTTP interaction testing. This approach significantly reduces code redundancy while making your test simulation more intuitive. The basic implementation provides a variety of response type shortcuts: use Illuminate\Support\Facades\Http; Http::fake([ 'google.com' => 'Hello World', 'github.com' => ['foo' => 'bar'], 'forge.laravel.com' =>

12 Best PHP Chat Scripts on CodeCanyon 12 Best PHP Chat Scripts on CodeCanyon Mar 13, 2025 pm 12:08 PM

Do you want to provide real-time, instant solutions to your customers' most pressing problems? Live chat lets you have real-time conversations with customers and resolve their problems instantly. It allows you to provide faster service to your custom

Explain the concept of late static binding in PHP. Explain the concept of late static binding in PHP. Mar 21, 2025 pm 01:33 PM

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

PHP Logging: Best Practices for PHP Log Analysis PHP Logging: Best Practices for PHP Log Analysis Mar 10, 2025 pm 02:32 PM

PHP logging is essential for monitoring and debugging web applications, as well as capturing critical events, errors, and runtime behavior. It provides valuable insights into system performance, helps identify issues, and supports faster troubleshoot

HTTP Method Verification in Laravel HTTP Method Verification in Laravel Mar 05, 2025 pm 04:14 PM

Laravel simplifies HTTP verb handling in incoming requests, streamlining diverse operation management within your applications. The method() and isMethod() methods efficiently identify and validate request types. This feature is crucial for building

Discover File Downloads in Laravel with Storage::download Discover File Downloads in Laravel with Storage::download Mar 06, 2025 am 02:22 AM

The Storage::download method of the Laravel framework provides a concise API for safely handling file downloads while managing abstractions of file storage. Here is an example of using Storage::download() in the example controller:

See all articles