©
本文档使用 PHP中文网手册 发布
(PHP 4 >= 4.2.0, PHP 5)
mb_split — 使用正则表达式分割多字节字符串
$pattern
, string $string
[, int $limit
= -1
] )
使用正则表达式 pattern
分割多字节 string
并返回结果 array 。
pattern
正则表达式。
string
待分割的 string 。
limit
limit
,将最多分割为 limit
个元素。
array 的结果。
Note:
The character encoding specified by mb_regex_encoding() will be used as the character encoding for this function by default.
[#1] Stas Trefilov, Vertilia [2015-07-02 11:01:47]
a (simpler) way to extract all characters from a UTF-8 string to array with a single call to a built-in function:
<?php
$str = '????-
?????';
print_r(preg_split('//u', $str, null, PREG_SPLIT_NO_EMPTY));
?>
Output:
Array
(
[0] => ??
[1] => ??
[2] => -
[3] =>
[4] => ??
[5] => ??
[6] => ??
[7] => ??
)
[#2] gunkan at terra dot es [2012-04-05 18:17:08]
To split an string like this: "??????????????Z????" using the "??" delimiter i used:
$v = mb_split('??',"??????????????Z????");
but didn't work.
The solution was to set this before:
mb_regex_encoding('UTF-8');
mb_internal_encoding("UTF-8");
$v = mb_split('??',"??????????????Z????");
and now it's working:
Array
(
[0] => ??
[1] => ??
[2] => ??
[3] => ???
[4] => ?Z
[5] => ??
)
[#3] boukeversteegh at gmail dot com [2011-04-14 13:23:25]
The $pattern argument doesn't use /pattern/ delimiters, unlike other regex functions such as preg_match.
<?php
# Works. No slashes around the /pattern/
print_r( mb_split("\s", "hello world") );
Array (
[0] => hello
[1] => world
)
# Doesn't work:
print_r( mb_split("/\s/", "hello world") );
Array (
[0] => hello world
)
?>
[#4] boukeversteegh at gmail dot com [2010-09-10 09:43:15]
In addition to Sezer Yalcin's tip.
This function splits a multibyte string into an array of characters. Comparable to str_split().
<?php
function mb_str_split( $string ) {
# Split at all position not after the start: ^
# and not before the end: $
return preg_split('/(?<!^)(?!$)/u', $string );
}
$string = '???';
$charlist = mb_str_split( $string );
print_r( $charlist );
?>
# Prints:
Array
(
[0] => ??
[1] => ??
[2] => ?
)
[#5] qdb at kukmara dot ru [2010-03-25 02:46:32]
an other way to str_split multibyte string:
<?php
$s='???????';
//$temp_s=iconv('UTF-8','UTF-16',$s);
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_a_len=count($temp_a);
for($i=0;$i<$temp_a_len;$i++){
//$temp_a[$i]=iconv('UTF-16','UTF-8',$temp_a[$i]);
$temp_a[$i]=mb_convert_encoding($temp_a[$i],'UTF-8','UTF-16');
}
echo('<pre>');
print_r($temp_a);
echo('</pre>');
//also possible to directly use UTF-16:
define('SLS',mb_convert_encoding('/','UTF-16'));
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_s=implode(SLS,$temp_a);
$temp_s=mb_convert_encoding($temp_s,'UTF-8','UTF-16');
echo($temp_s);
?>
[#6] gert dot matern at web dot de [2009-08-03 03:34:58]
We are talking about Multi Byte ( e.g. UTF-8) strings here, so preg_split will fail for the following string:
'Wei?e Rosen sind nicht gr??n!'
And because I didn't find a regex to simulate a str_split I optimized the first solution from adjwilli a bit:
<?php
$string = 'Wei?e Rosen sind nicht gr??n!'
$stop = mb_strlen( $string);
$result = array();
for( $idx = 0; $idx < $stop; $idx++)
{
$result[] = mb_substr( $string, $idx, 1);
}
?>
Here is an example with adjwilli's function:
<?php
mb_internal_encoding( 'UTF-8');
mb_regex_encoding( 'UTF-8');
function mbStringToArray
( $string
)
{
$stop = mb_strlen( $string);
$result = array();
for( $idx = 0; $idx < $stop; $idx++)
{
$result[] = mb_substr( $string, $idx, 1);
}
return $result;
}
echo '<pre>', PHP_EOL,
print_r( mbStringToArray( 'Wei?e Rosen sind nicht gr??n!', true)), PHP_EOL,
'</pre>';
?>
Let me know [by personal email], if someone found a regex to simulate a str_split with mb_split.
[#7] adjwilli at yahoo dot com [2007-12-26 09:37:58]
I figure most people will want a simple way to break-up a multibyte string into its individual characters. Here's a function I'm using to do that. Change UTF-8 to your chosen encoding method.
<?php
function mbStringToArray ($string) {
$strlen = mb_strlen($string);
while ($strlen) {
$array[] = mb_substr($string,0,1,"UTF-8");
$string = mb_substr($string,1,$strlen,"UTF-8");
$strlen = mb_strlen($string);
}
return $array;
}
?>