In PHP, there are functions such as urlencode(), urldecode(), rawurlencode(), and rawurldecode() to solve the problem of web page URL encoding and decoding.
Understanding urlencode:
urlencode: refers to an encoding conversion method for Chinese characters in web page URLs. The most common one is that when Chinese queries are entered in search engines such as Baidu and Google, an encoded web page URL is generated. There are generally two ways of urlencoding: one is the traditional GB2312-based Encode (used by Baidu, Yisou, etc.), and the other is utf-8-based Encode (used by Google, Yahoo, etc.). This article analyzes the two methods of Encode and Decode respectively.
Chinese -> Encode of GB2312 -> %D6%D0%CE%C4
Chinese -> Encode of utf-8 -> %E4%B8%AD%E6%96%87
urlencode in Html:
In the html file encoded as GB2312:
http://www.phpernote.com/中文.rar -> The browser automatically converts to -> http://www.phpernote.com/%D6%D0%CE%C4.rar
Note: Firefox does not have good support for Chinese URLs in GB2312 Encode because it sends URLs in utf-8 encoding by default, but the ftp:// protocol is fine. This should be considered a bug in Firefox.
In html file encoded as utf-8:
http://www.phpernote.com/中文.rar -> The browser automatically converts to -> http://www.phpernote.com/%E4%B8%AD%E6%96%87.rar
urlencode in PHP:
//GB2312的Encode echo urlencode("中文-_. ")."\n"; //%D6%D0%CE%C4-_.+ echo urldecode("%D6%D0%CE%C4-_. ")."\n"; //中文-_. echo rawurlencode("中文-_. ")."\n"; //%D6%D0%CE%C4-_.%20 echo rawurldecode("%D6%D0%CE%C4-_. ")."\n"; //中文-_.
All non-alphanumeric characters except -_. will be replaced with a percent sign (%) followed by two hexadecimal digits.
The difference between urlencode and rawurlencode:
urlencode encodes spaces as plus signs (+)
rawurlencode encodes spaces as plus signs (%20)
My last version of the txt file splitter (online) code used urlencode. I have never found this problem. As a result, a serious bug occurred today. All URLs with spaces cannot be solved Analyzed, resulting in the split file being unable to be downloaded. Using the rawurlencode() function solves this problem.
If you want to use utf-8 Encode, there are two methods:
1. Save the file as a utf-8 file, just use urlencode or rawurlencode directly.
2. Use the mb_convert_encoding function.
$url = 'http://www.phpernote.com/中文.rar'; echo urlencode(mb_convert_encoding($url, 'utf-8', 'gb2312'))."\n"; echo rawurlencode(mb_convert_encoding($url, 'utf-8', 'gb2312'))."\n"; //http%3A%2F%2Fwww.huikaiche.com%2F%E4%B8%AD%E6%96%87.rar
Application examples:
function parseurl($url=""){ $url = rawurlencode(mb_convert_encoding($url, 'gb2312', 'utf-8')); $a = array("%3A", "%2F", "%40"); $b = array(":", "/", "@"); $url = str_replace($a, $b, $url); return $url; } $url="ftp://yongfu:password@www.huikaiche.com/中文/中文.rar"; echo parseurl($url); //ftp://yongfu:password@www.huikaiche.com/%D6%D0%CE%C4/%D6%D0%CE%C4.rar