Decoding URL-Encoded UTF-8 Strings in Python
When working with URLs, you may encounter strings that have been encoded using UTF-8 and escaped with URL quoting. To extract the correct data from these strings, you need to decode them.
In Python 2.7, you can use urllib.unquote() to decode URL-encoded data. However, this method returns bytes, so you need to decode them further:
<code class="python">from urllib import unquote url = unquote(url).decode('utf8')</code>
In Python 3 and above, the urllib package has been split into urllib.request, urllib.parse, and urllib.error. To decode URL-encoded data, you should use urllib.parse.unquote():
<code class="python">from urllib.parse import unquote url = unquote(url)</code>
This method handles both URL encoding and UTF-8 decoding, giving you a unicode string as the result.
For example:
<code class="python">>>> from urllib.parse import unquote >>> url = 'example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0' >>> unquote(url) 'example.com?title=правовая+защита'</code>
By using urllib.parse.unquote(), you can easily decode URL-encoded UTF-8 strings, ensuring that you obtain the correct data.
The above is the detailed content of How to Decode URL-Encoded UTF-8 Strings in Python?. For more information, please follow other related articles on the PHP Chinese website!