半角全角的处理是字符串处理的常见问题,本文尝试为大家提供一个思路。
一、概念
全角字符unicode编码从65281~65374 (十六进制 0xFF01 ~ 0xFF5E)
半角字符unicode编码从33~126 (十六进制 0x21~ 0x7E)
空格比较特殊,全角为 12288(0x3000),半角为 32 (0x20)
而且除空格外,全角/半角按unicode编码排序在顺序上是对应的
所以可以直接通过用+-法来处理非空格数据,对空格单独处理
二、实现思路
1. 找到目标unicode的字符,可以使用正则表达式解决
2. 修改unicode编码
三、实现
1. 首先是两个unicode与字符的转换函数:
<span> 1</span> <span>/*</span><span>* </span><span> 2</span> <span> * 将unicode转换成字符 </span><span> 3</span> <span> * @param int $unicode </span><span> 4</span> <span> * @return string UTF-8字符 </span><span> 5</span> <span> *</span><span>*/</span> <span> 6</span> <span>function</span> unicode2Char(<span>$unicode</span><span>){ </span><span> 7</span> <span>if</span>(<span>$unicode</span> < 128) <span>return</span> <span>chr</span>(<span>$unicode</span><span>); </span><span> 8</span> <span>if</span>(<span>$unicode</span> < 2048) <span>return</span> <span>chr</span>((<span>$unicode</span> >> 6) + 192) . <span> 9</span> <span>chr</span>((<span>$unicode</span> & 63) + 128<span>); </span><span>10</span> <span>if</span>(<span>$unicode</span> < 65536) <span>return</span> <span>chr</span>((<span>$unicode</span> >> 12) + 224) . <span>11</span> <span>chr</span>(((<span>$unicode</span> >> 6) & 63) + 128) . <span>12</span> <span>chr</span>((<span>$unicode</span> & 63) + 128<span>); </span><span>13</span> <span>if</span>(<span>$unicode</span> < 2097152) <span>return</span> <span>chr</span>((<span>$unicode</span> >> 18) + 240) . <span>14</span> <span>chr</span>(((<span>$unicode</span> >> 12) & 63) + 128) . <span>15</span> <span>chr</span>(((<span>$unicode</span> >> 6) & 63) + 128) . <span>16</span> <span>chr</span>((<span>$unicode</span> & 63) + 128<span>); </span><span>17</span> <span>return</span> <span>false</span><span>; </span><span>18</span> <span> } </span><span>19</span> <span>20</span> <span>/*</span><span>* </span><span>21</span> <span> * 将字符转换成unicode </span><span>22</span> <span> * @param string $char 必须是UTF-8字符 </span><span>23</span> <span> * @return int </span><span>24</span> <span> *</span><span>*/</span> <span>25</span> <span>function</span> char2Unicode(<span>$char</span><span>){ </span><span>26</span> <span>switch</span> (<span>strlen</span>(<span>$char</span><span>)){ </span><span>27</span> <span>case</span> 1 : <span>return</span> <span>ord</span>(<span>$char</span><span>); </span><span>28</span> <span>case</span> 2 : <span>return</span> (<span>ord</span>(<span>$char</span>{1}) & 63) | <span>29</span> ((<span>ord</span>(<span>$char</span>{0}) & 31) << 6<span>); </span><span>30</span> <span>case</span> 3 : <span>return</span> (<span>ord</span>(<span>$char</span>{2}) & 63) | <span>31</span> ((<span>ord</span>(<span>$char</span>{1}) & 63) << 6) | <span>32</span> ((<span>ord</span>(<span>$char</span>{0}) & 15) << 12<span>); </span><span>33</span> <span>case</span> 4 : <span>return</span> (<span>ord</span>(<span>$char</span>{3}) & 63) | <span>34</span> ((<span>ord</span>(<span>$char</span>{2}) & 63) << 6) | <span>35</span> ((<span>ord</span>(<span>$char</span>{1}) & 63) << 12) | <span>36</span> ((<span>ord</span>(<span>$char</span>{0}) & 7) << 18<span>); </span><span>37</span> <span>default</span> : <span>38</span> <span>trigger_error</span>('Character is not UTF-8!', <span>E_USER_WARNING</span><span>); </span><span>39</span> <span>return</span> <span>false</span><span>; </span><span>40</span> <span> } </span><span>41</span> }
2. 全角转半角
<span> 1</span> <span>/*</span><span>* </span><span> 2</span> <span> * 全角转半角 </span><span> 3</span> <span> * @param string $str </span><span> 4</span> <span> * @return string </span><span> 5</span> <span> *</span><span>*/</span> <span> 6</span> <span>function</span> sbc2Dbc(<span>$str</span><span>){ </span><span> 7</span> <span>return</span> <span>preg_replace</span><span>( </span><span> 8</span> <span>//</span><span> 全角字符 </span> <span> 9</span> '/[\x{3000}\x{ff01}-\x{ff5f}]/ue', <span>10</span> <span>//</span><span> 编码转换 </span><span>11</span> <span> // 0x3000是空格,特殊处理,其他全角字符编码-0xfee0即可以转为半角</span> <span>12</span> '($unicode=char2Unicode(\'\0\')) == 0x3000 ? " " : (($code=$unicode-0xfee0) > 256 ? unicode2Char($code) : chr($code))', <span>13</span> <span>$str</span> <span>14</span> <span> ); </span><span>15</span> }
3. 半角转全角
<span> 1</span> <span>/*</span><span>* </span><span> 2</span> <span> * 半角转全角 </span><span> 3</span> <span> * @param string $str </span><span> 4</span> <span> * @return string </span><span> 5</span> <span> *</span><span>*/</span> <span> 6</span> <span>function</span> dbc2Sbc(<span>$str</span><span>){</span> <span> 7</span> <span>return</span> <span>preg_replace</span><span>( </span><span> 8</span> <span>//</span><span> 半角字符 </span> <span> 9</span> '/[\x{0020}\x{0020}-\x{7e}]/ue', <span>10</span> <span>//</span><span> 编码转换 </span><span>11</span> <span> // 0x0020是空格,特殊处理,其他半角字符编码+0xfee0即可以转为全角</span> <span>12</span> '($unicode=char2Unicode(\'\0\')) == 0x0020 ? unicode2Char(0x3000) : (($code=$unicode+0xfee0) > 256 ? unicode2Char($code) : chr($code))', <span>13</span> <span>$str</span> <span>14</span> <span> ); </span><span>15</span> }
四、测试
示例代码:
<span>1</span> <span>$a</span> = 'abc12 345'<span>; </span><span>2</span> <span>$sbc</span> = dbc2Sbc(<span>$a</span><span>); </span><span>3</span> <span>$dbc</span> = sbc2Dbc(<span>$sbc</span><span>); </span><span>4</span> <span>5</span> <span>var_dump</span>(<span>$a</span>, <span>$sbc</span>, <span>$dbc</span>);
结果:
<span>1</span> <span>string</span>(9) "abc12 345" <span>2</span> <span>string</span>(27) "abc12 345" <span>3</span> <span>string</span>(9) "abc12 345"