The example in this article describes the Python character encoding judgment method. Share it with everyone for your reference, the details are as follows:
Method 1:
isinstance(s, str) is used to determine whether it is a general string
isinstance(s, unicode) is used to determine whether it is unicode
or
if type(str).__name__!="unicode": str=unicode(str,"utf-8") else: pass
Method 2 :
Python chardet character encoding judgment
Using chardet can easily implement string/file encoding detection. Especially for Chinese web pages, some pages use GBK/GB2312, and some use UTF8. If you need to crawl some pages, it is important to know the web page encoding. Although HTML pages have charset tags, sometimes they are incorrect. Then chardet can help us a lot.
chardet instance
>>> import urllib >>> rawdata = urllib.urlopen('http://www.google.cn/').read() >>> import chardet >>> chardet.detect(rawdata) {'confidence': 0.98999999999999999, 'encoding': 'GB2312'} >>>chardet可以直接用detect函数来检测所给字符的编码。函数返回值为字典,有2个元数,一个是检测的可信度,另外一个就是检测到的编码。
chardet installation
After downloading chardet, unzip the chardet compressed package, place the chardet folder directly in the application directory, and then use import chardet to start using chardet.
Or use the setup.py installation file to copy chardet to the Python system directory, so that all your python programs only need to import chardet.
python setup.py install reference
chardet official website: http://chardet.feedparser.org/
chardet download page: http://chardet.feedparser.org/download/
For more articles related to Python character encoding judgment methods, please pay attention to the PHP Chinese website!