node.js - 做爬虫时,偶尔会爬到这样的内容天堂向左,如何转为中文?
PHP中文网
PHP中文网 2017-04-17 13:23:21
0
2
583

我想了解编码方面的知识,有好的书籍推荐吗,谢谢!!

感谢两位的回答,根据提示我自己写了个程序测试,OK的,还会不会有其他情况呢?

var code10, code16, zh;

code10 = '天堂向左,深圳向右';

zh = code10.replace(/&#(\d+);/g, function($, $1) {return String.fromCharCode($1)});

console.log(zh);

code16 = zh.replace(/[^\u0000-\u00ff]/g, function($) {return '&#x' + $.codePointAt(0).toString(16) + ';';});

console.log(code16);

zh = code16.replace(/&#x(\w+);/g, function($, $1) {return String.fromCharCode(parseInt($1, 16))});

console.log(zh);
PHP中文网
PHP中文网

认证0级讲师

reply all(2)
刘奇

The beginning of

is decimal encoding. What you need to pay attention to when converting to Chinese is that Chinese is a multi-character encoding. You can use javascript functions

String.fromCharCode(str.substr(2),10)

Write a small tool with a loop and process it on the front end before crawling, for example `String.fromCharCode("天".substr(2),10)
Get "天".
I happened to write a small tool today,

https://github.com/hunnble/JavaScript_learning/blob/master/change-radix.html

Open it in the browser and enter the characters you want to transcode, then select the base to 10 and then decode.

PHPzhong

〹The number 12345 in such an entity is the unicode encoding expressed in decimal, and it can be converted into the corresponding unicode character.

If it is ካ, the 12ab is the unicode encoding expressed in hexadecimal.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template