How to convert string to UTF-8 format in JavaScript

PHPz
Release: 2023-04-05 14:57:00
Original
5159 people have browsed it

In daily development, we often need to convert strings into UTF-8 format, because UTF-8 is a universal character encoding method that supports multi-language characters, including Chinese, Japanese, Korean, etc. JavaScript is a commonly used scripting language that can help us achieve this conversion process.

This article will introduce how to convert a string into UTF-8 format in JavaScript from the following aspects:

  1. Understand UTF-8 encoding method
  2. Quickly convert strings to UTF-8
  3. Complete UTF-8 transcoding solution
  4. Understand UTF-8 encoding method

UTF-8 is a transformation Long character encoding, its encoding rules are as follows:

  • For single-byte characters, the first bit of the byte is set to 0, and the next 7 bits are the Unicode code of the character;
  • For multi-byte characters, the first n bits of the first byte are all 1, the n 1th bit is 0, the first 2 bits of the following bytes are all set to 10, and the remaining 6 bits are the Unicode code of the character.

For example, the Unicode code of the Chinese character "you" is "U 4F60". According to the above rules, it should be "E4 BD A0" after being converted into UTF-8 encoding.

  1. Quickly convert strings to UTF-8

In JavaScript, we can easily convert strings to UTF-8 format through encoding and decoding functions.

The first is the encoding function. We can use the encodeURIComponent() function to convert the string into URI encoding format, and then splice the URI encoding of each character into the final string in UTF-8 format. The sample code is as follows :

function utf8Encode(str) {
  let encodedStr = encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, (match, p1) => {
    return String.fromCharCode(parseInt(p1, 16));
  });

  let utf8Str = "";
  for (let i = 0; i < encodedStr.length; i++) {
    let charCode = encodedStr.charCodeAt(i);
    if (charCode < 128) {
      utf8Str += String.fromCharCode(charCode);
    } else if (charCode < 2048) {
      utf8Str += String.fromCharCode((charCode >> 6) | 192);
      utf8Str += String.fromCharCode((charCode & 63) | 128);
    } else {
      utf8Str += String.fromCharCode((charCode >> 12) | 224);
      utf8Str += String.fromCharCode(((charCode >> 6) & 63) | 128);
      utf8Str += String.fromCharCode((charCode & 63) | 128);
    }
  }

  return utf8Str;
}
Copy after login

Among them, the encodeURIComponent() function is used to convert the string into URI encoding, and the replace() function is used to splice the URI encoding of each character into the final string in UTF-8 format.

The decoding function uses the decodeURIComponent() function to decode the encoded string. The sample code is as follows:

function utf8Decode(utf8Str) {
  let decodedStr = "";
  let i = 0;

  while (i < utf8Str.length) {
    let charCode = utf8Str.charCodeAt(i);
    if (charCode < 128) {
      decodedStr += String.fromCharCode(charCode);
      i++;
    } else if (charCode >= 192 && charCode < 224) {
      decodedStr += String.fromCharCode(((charCode & 31) << 6) | (utf8Str.charCodeAt(i + 1) & 63));
      i += 2;
    } else {
      decodedStr += String.fromCharCode(((charCode & 15) << 12) | ((utf8Str.charCodeAt(i + 1) & 63) << 6) | (utf8Str.charCodeAt(i + 2) & 63));
      i += 3;
    }
  }

  return decodeURIComponent(decodedStr);
}
Copy after login
  1. Complete UTF-8 transcoding solution

Although the above function can convert strings to UTF-8 format, this method is not very practical if we need to transcode strings in the entire web application. At this time, we can use third-party libraries, such as iconv-lite, to complete the transcoding task of the entire application. The sample code is as follows:

const iconv = require("iconv-lite");

let utf8Str = "欢迎使用 iconv-lite 库";

let buf = iconv.encode(utf8Str, "utf8"); // 转成 UTF-8 Buffer
let gbkStr = iconv.decode(buf, "gbk"); // 转成 GBK 编码字符串
Copy after login

In the above code, we use the iconv.encode() function to convert the string Convert to a UTF-8 encoded Buffer, and then use the iconv.decode() function to convert the Buffer into a correspondingly encoded string. It should be noted that to use the iconv-lite library, you need to install it through npm first. The installation method is:

npm install iconv-lite
Copy after login

Summary

This article introduces how to convert strings into UTF-8 format in JavaScript . We learned about the UTF-8 encoding method, realized a simple method of converting strings to UTF-8 through encoding and decoding functions, and introduced the use of the iconv-lite library to complete the transcoding task of the entire application. In actual development, choosing an appropriate method based on actual needs can reduce development costs and improve work efficiency.

The above is the detailed content of How to convert string to UTF-8 format in JavaScript. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template