I recently wrote a crawling script. Most of the content captured is normal, but a small amount of it is garbled
Detect the character encoding and the result is CP936
1 | mb_detect_encoding(<span style= "color: #800080;" > $str </span>, 'GBK, gb2312, GB18030, ISO-8859-1, ASCII, UTF-8' , <span style= "color: #0000ff;" >true</span>)
|
Copy after login
Try to convert this encoding, but the result is still garbled
1 2 | mb_convert_encoding( $str , 'UTF-8' , 'CP936' );
氓聧掳氓潞娄盲赂聙70氓虏聛猫聙聛氓陇麓莽聦楼盲潞碌7氓虏聛氓楼鲁氓颅漏猫聙聦猫垄芦忙聧聲
|
Copy after login
Finally found out that this can be transcoded
1 | iconv( 'utf-8' , 'latin1' , $str );
|
Copy after login
1 | iconv( 'utf-8' , 'latin1//IGNORE' , $str );
|
Copy after login