Summary of methods for intercepting Chinese characters and preventing garbled characters in PHP

PHP中文网
Release: 2016-07-13 09:55:17
Original
844 people have browsed it

Directly using the PHP function substr to intercept Chinese characters may cause garbled characters. The main reason is that substr may forcibly "saw" a Chinese character in half. So let's see how to solve this problem.

I believe that everyone often uses interception of strings in their own programs, but often encounters the problem of garbled characters when intercepting Chinese strings. It is very troublesome. Next, we will introduce two methods to prevent garbled characters when intercepting Chinese strings.
First of all, a function written by yourself is convenient to use.
Use this function to intercept and there will be no garbled characters.

/** 
 * 支持中文字符串截取 
 */ 
function msubstr($str, $start=0, $length, $charset="utf-8", $suffix=true){ 
  switch($charset){ 
    case 'utf-8':$char_len=3;break; 
    case 'UTF8':$char_len=3;break; 
    default:$char_len=2; 
  } 
  //小于指定长度,直接返回 
  if(strlen($str)<=($length*$char_len)){   
    return $str; 
  } 
  if(function_exists("mb_substr")){  
    $slice= mb_substr($str, $start, $length, $charset); 
  }else if(function_exists(&#39;iconv_substr&#39;)){ 
    $slice=iconv_substr($str,$start,$length,$charset); 
  }else{ 
    $re[&#39;utf-8&#39;]  = "/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xff][\x80-\xbf]{3}/"; 
    $re[&#39;gb2312&#39;] = "/[\x01-\x7f]|[\xb0-\xf7][\xa0-\xfe]/"; 
    $re[&#39;gbk&#39;]  = "/[\x01-\x7f]|[\x81-\xfe][\x40-\xfe]/"; 
    $re[&#39;big5&#39;]  = "/[\x01-\x7f]|[\x81-\xfe]([\x40-\x7e]|\xa1-\xfe])/"; 
    preg_match_all($re[$charset], $str, $match); 
    $slice = join("",array_slice($match[0], $start, $length)); 
  } 
  if($suffix) 
    return $slice; 
  return $slice; 
}
Copy after login

The second is a built-in function in PHP, the mb_substr function

Specifies the encoding format of the string to be intercepted , can effectively prevent garbled characters.

Description

string mb_substr ( string $str , int $start [, int $length [, string $encoding ]] ) 
<?php 
 function substr_unicode($str, $s, $l = null) { 
   return join("", array_slice( 
     preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l)); 
 } 
 
$str = "Büyük"; 
 $s = 0; // start from "0" (nth) char 
 $l = 3; // get "3" chars 
 echo substr($str, $s, $l) ."\n";  
 echo mb_substr($str, $s, $l) ."\n"; 
 echo substr_unicode($str, $s, $l); 
 ?>
Copy after login

The above is the entire content of this article, I hope you all like it.

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template