Use PHP to determine whether a file is UTF-8 encoded (check Bom)_PHP tutorial

WBOY
Release: 2016-07-21 14:53:53
Original
1225 people have browsed it

UTF-8 encoded files are divided into two types: with Bom and without Bom. The one with Bom is easy for everyone to process, while the one without Bom will be a bit troublesome, so I wrote a function to judge. The code is as follows :

//Return 1 means pure ASCII (that is, all characters are not greater than 127)
//Return 2 means UTF8
//Return 0 means normal gb encoding

function TestUtf8($text)
{
if(strlen($text) < 3) return false;
$lastch = 0;
$begin = 0;
$ BOM = true;
$BOMchs = array(0xEF, 0xBB, 0xBF);
$good = 0;
$bad = 0;
$notAscii = 0;
for($i =0; $i < strlen($text); $i++)
{
$ch = ord($text[$i]);
if($begin < 3)
{
$BOM = ($BOMchs[$begin]==$ch);
$begin += 1;
continue;
}

if($begin== 4 && $BOM) break;

if($ch >= 0x80 ) $notAscii++;

if( ($ch&0xC0) == 0x80 )
{
if( ($lastch&0xC0) == 0xC0 )
{
$good += 1;
}
else if( ($lastch&0x80) == 0 )
{
$bad += 1;
}
}
else if( ($lastch&0xC0) == 0xC0 )
{
$bad += 1;
}
$lastch = $ch;
}
if($begin == 4 && $BOM)
{
return 2;
}
else if($notAscii==0)
{
return 1;
}
else if ($good >= $bad )
{
return 2;
}
else
{
return 0;
}
}

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/364705.htmlTechArticleUTF-8 encoded files are divided into two types: with Bom and without Bom. It is easy for everyone with Bom. Processing without Bom would be a bit troublesome, so I wrote a function to judge. The code is as follows: //Return...
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template