©
This document uses PHP Chinese website manual Release
(PHP 5 >= 5.3.0, PECL intl >= 1.0.0)
Normalizer::normalize -- normalizer_normalize — Normalizes the input provided and returns the normalized string
面向对象风格
$input
[, int $form
= Normalizer::FORM_C
] )过程化风格
$input
[, int $form
= Normalizer::FORM_C
] )Normalizes the input provided and returns the normalized string
input
The input string to normalize
form
One of the normalization forms.
The normalized string or NULL
if an error occurred.
Example #1 normalizer_normalize() example
<?php
$char_A_ring = "\xC3\x85" ; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A" ; // 'COMBINING RING ABOVE' (U+030A)
$char_1 = normalizer_normalize ( $char_A_ring , Normalizer :: FORM_C );
$char_2 = normalizer_normalize ( 'A' . $char_combining_ring_above , Normalizer :: FORM_C );
echo urlencode ( $char_1 );
echo ' ' ;
echo urlencode ( $char_2 );
?>
Example #2 OO example
<?php
$char_A_ring = "\xC3\x85" ; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A" ; // 'COMBINING RING ABOVE' (U+030A)
$char_1 = Normalizer :: normalize ( $char_A_ring , Normalizer :: FORM_C );
$char_2 = Normalizer :: normalize ( 'A' . $char_combining_ring_above , Normalizer :: FORM_C );
echo urlencode ( $char_1 );
echo ' ' ;
echo urlencode ( $char_2 );
?>
以上例程会输出:
%C3%85 %C3%85
[#1] spam at oscar dot xyz [2015-02-25 14:39:43]
You can use the 'original' abbreviations if you feel more comfortable:
<?php
Normalizer::NFD;
Normalizer::NFKD;
Normalizer::NFC;
Normalizer::NFKC;
?>
[#2] tom dot vom dot berg at online dot de [2014-04-12 08:54:07]
If you get error messages while starting apache of xampp package with activated extension=intl.dll, copy the files
* icudt##.dll
* icuin##.dll
* icuio##.dll
* icule##.dll
* iculx##.dll
* icutu##.dll
* icuuc##.dll
## = version number
from "/program files/xampp/php"
into your "/program files/xampp/apache/bin" or whereever your xampp resides :-)
[#3] o_shes01 at uni-muenster dot de [2011-01-23 08:59:55]
This method/function will return boolean false if $input is not a valid utf-8-string, e.g.
<?php
var_dump(Normalizer::normalize("\xFF"));
// prints "bool(false)"
?>
[#4] akniep at rayo dot info [2009-07-30 09:03:53]
Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before.
The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.
Be sure to have the PHP-Normalizer-extension (intl and icu) installed.
Tipp: You may also want to map the text to lower case before execute matching procedures ...
<?php
function normalizeUtf8String( $s)
{
// Normalizer-class missing!
if (! class_exists("Normalizer", $autoload = false))
return $original_string;
// maps German (umlauts) and other European characters onto two characters before just removing diacritics
$s = preg_replace( '@\x{00c4}@u' , "AE", $s ); // umlaut ? => AE
$s = preg_replace( '@\x{00d6}@u' , "OE", $s ); // umlaut ? => OE
$s = preg_replace( '@\x{00dc}@u' , "UE", $s ); // umlaut ? => UE
$s = preg_replace( '@\x{00e4}@u' , "ae", $s ); // umlaut ? => ae
$s = preg_replace( '@\x{00f6}@u' , "oe", $s ); // umlaut ? => oe
$s = preg_replace( '@\x{00fc}@u' , "ue", $s ); // umlaut ?? => ue
$s = preg_replace( '@\x{00f1}@u' , "ny", $s ); // ? => ny
$s = preg_replace( '@\x{00ff}@u' , "yu", $s ); // ? => yu
// maps special characters (characters with diacritics) on their base-character followed by the diacritical mark
// exmaple: ? => U?, ?? => a`
$s = Normalizer::normalize( $s, Normalizer::FORM_D );
$s = preg_replace( '@\pM@u' , "", $s ); // removes diacritics
$s = preg_replace( '@\x{00df}@u' , "ss", $s ); // maps German ? onto ss
$s = preg_replace( '@\x{00c6}@u' , "AE", $s ); // ? => AE
$s = preg_replace( '@\x{00e6}@u' , "ae", $s ); // ? => ae
$s = preg_replace( '@\x{0132}@u' , "IJ", $s ); // ? => IJ
$s = preg_replace( '@\x{0133}@u' , "ij", $s ); // ? => ij
$s = preg_replace( '@\x{0152}@u' , "OE", $s ); // ? => OE
$s = preg_replace( '@\x{0153}@u' , "oe", $s ); // ? => oe
$s = preg_replace( '@\x{00d0}@u' , "D", $s ); // ? => D
$s = preg_replace( '@\x{0110}@u' , "D", $s ); // ? => D
$s = preg_replace( '@\x{00f0}@u' , "d", $s ); // ? => d
$s = preg_replace( '@\x{0111}@u' , "d", $s ); // d => d
$s = preg_replace( '@\x{0126}@u' , "H", $s ); // H => H
$s = preg_replace( '@\x{0127}@u' , "h", $s ); // h => h
$s = preg_replace( '@\x{0131}@u' , "i", $s ); // i => i
$s = preg_replace( '@\x{0138}@u' , "k", $s ); // ? => k
$s = preg_replace( '@\x{013f}@u' , "L", $s ); // ? => L
$s = preg_replace( '@\x{0141}@u' , "L", $s ); // L => L
$s = preg_replace( '@\x{0140}@u' , "l", $s ); // ? => l
$s = preg_replace( '@\x{0142}@u' , "l", $s ); // l => l
$s = preg_replace( '@\x{014a}@u' , "N", $s ); // ? => N
$s = preg_replace( '@\x{0149}@u' , "n", $s ); // ? => n
$s = preg_replace( '@\x{014b}@u' , "n", $s ); // ? => n
$s = preg_replace( '@\x{00d8}@u' , "O", $s ); // ? => O
$s = preg_replace( '@\x{00f8}@u' , "o", $s ); // ? => o
$s = preg_replace( '@\x{017f}@u' , "s", $s ); // ? => s
$s = preg_replace( '@\x{00de}@u' , "T", $s ); // ? => T
$s = preg_replace( '@\x{0166}@u' , "T", $s ); // T => T
$s = preg_replace( '@\x{00fe}@u' , "t", $s ); // ? => t
$s = preg_replace( '@\x{0167}@u' , "t", $s ); // t => t
// remove all non-ASCii characters
$s = preg_replace( '@[^\0-\x80]@u' , "", $s );
// possible errors in UTF8-regular-expressions
if (empty($s))
return $original_string;
else
return $s;
}
?>
The above function is mainly based on the following article:
http://ahinea.com/en/tech/accented-translate.html