How to convert 'u00e9' to utf8 characters in mysql or php?
P粉704196697
2023-08-24 20:34:18
<p>I am doing data cleaning on some messy data that I am importing into mysql. </p>
<p>The data contains "pseudo" unicode characters that are actually embedded in the string, such as "u00e9" etc. </p>
<p>So a field might be.."Jalostotitlu00e1n"
I need to rip off that awkward 'u00e1n' and replace it with the corresponding utf character</p>
<p>I could do this in mysql, maybe using substrings and CHR, but I'm preprocessing the data via PHP, so I can do it there as well. </p>
<p>I already know how to configure mysql and php to use utf data. The problem actually lies in the source data I imported. </p>
<p>Thank you</p>
/* php function to convert utf8 html to ansi */
There is a way. Replace all
uXXXX
with their HTML representation and executehtml_entity_decode()
That is
Every UTF character of the formecho html_entity_decode("Jalostotitlán");
u1234
can be printed in HTML asሴ
. But doing the replacement is very difficult because if there are no other characters to identify the beginning of the UTF sequence, you can get a lot of false positives. A simple regular expression might bepreg_replace('/u([\da-fA-F]{4})/', '\1;', $str)