Table of Contents
Character set
Node.js Buffer encoding
Encoding source code
Summary
Home Web Front-end JS Tutorial Let's talk about encoding in Node.js Buffer

Let's talk about encoding in Node.js Buffer

Aug 31, 2021 am 10:28 AM
buffer encoding node.js

This article will take you to understand the encoding in Node.js Buffer, I hope it will be helpful to everyone!

Let's talk about encoding in Node.js Buffer

The smallest unit of a computer is a bit, that is, 0 and 1, which are corresponded to high and low levels on the hardware. However, only one bit represents too little information, so 8 bits are specified as one byte. After that, various information such as numbers and strings are stored based on bytes. [Recommended study: "nodejs Tutorial"] How to store

characters? It relies on encoding. Different characters correspond to different encodings. Then when rendering is needed, the font library is checked according to the corresponding encoding, and then the graphics of the corresponding characters are rendered.

Character set

The character set (charset) was originally ASCII code, which is abc ABC 123 and other 128 characters, because the computer was first invented in the United States. Later, Europe also developed a set of character set standards called ISO, and later China also developed a set of character set standards called GBK.

The International Organization for Standardization felt that we couldn’t each have one, otherwise the same code would have different meanings in different character sets, so we proposed unicode coding to include most of the world’s codes, so that each character Only unique encoding.

But ASCII code only requires 1 byte to store, while GBK requires 2 bytes, and some character sets require 3 bytes, etc. Some only need one byte to store but store 2 Bytes, which is a waste of space. So there are different encoding schemes such as utf-8, utf-16, utf-24, etc.

utf-8, utf-16, and utf-24 are all unicode encodings, but the specific implementation plans are different.

UTF-8 In order to save space, a variable-length storage scheme from 1 to 6 bytes is designed. UTF-16 is fixed at 2 bytes, and UTF-24 is fixed at 4 bytes.

Lets talk about encoding in Node.js Buffer

Finally, UTF-8 is widely used because it takes up the least space.

Node.js Buffer encoding

Each language supports character set encoding and decoding, and Node.js does the same.

Node.js can use Buffer to store binary data, and when converting binary data to a string, you need to specify the character set. Buffer's from, byteLength, lastIndexOf and other methods support specifying encoding:

The specifically supported encodings are:

utf8, ucs2, utf16le, latin1, ascii, base64, hex

Some students may find that: base64 and hex are not character sets Ah, why are you here?

Yes, in addition to the character set, the byte-to-character encoding scheme also includes base64 for converting to plaintext characters, and hex for converting to hexadecimal.

This is why Node.js calls it encoding instead of charset, because the supported encoding and decoding schemes are not just character sets.

If encoding is not specified, the default is utf8.

1

2

3

const buf = Buffer.alloc(11, 'aGVsbG8gd29ybGQ=', 'base64');

 

console.log(buf.toString());// hello world

Copy after login

Encoding source code

I went through the Node.js source code about encoding:

This section implements encoding: https: //github.com/nodejs/node/blob/master/lib/buffer.js#L587-L726

You can see that each encoding implements encoding, encodingVal, byteLength, write, slice, indexOf. Several APIs, because these APIs use different encoding schemes, will have different results. Node.js will return different objects according to the incoming encoding. This is a polymorphic idea.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

const encodingOps = {

  utf8: {

    encoding: 'utf8',

    encodingVal: encodingsMap.utf8,

    byteLength: byteLengthUtf8,

    write: (buf, string, offset, len) => buf.utf8Write(string, offset, len),

    slice: (buf, start, end) => buf.utf8Slice(start, end),

    indexOf: (buf, val, byteOffset, dir) =>

      indexOfString(buf, val, byteOffset, encodingsMap.utf8, dir)

  },

  ucs2: {

    encoding: 'ucs2',

    encodingVal: encodingsMap.utf16le,

    byteLength: (string) => string.length * 2,

    write: (buf, string, offset, len) => buf.ucs2Write(string, offset, len),

    slice: (buf, start, end) => buf.ucs2Slice(start, end),

    indexOf: (buf, val, byteOffset, dir) =>

      indexOfString(buf, val, byteOffset, encodingsMap.utf16le, dir)

  },

  utf16le: {

    encoding: 'utf16le',

    encodingVal: encodingsMap.utf16le,

    byteLength: (string) => string.length * 2,

    write: (buf, string, offset, len) => buf.ucs2Write(string, offset, len),

    slice: (buf, start, end) => buf.ucs2Slice(start, end),

    indexOf: (buf, val, byteOffset, dir) =>

      indexOfString(buf, val, byteOffset, encodingsMap.utf16le, dir)

  },

  latin1: {

    encoding: 'latin1',

    encodingVal: encodingsMap.latin1,

    byteLength: (string) => string.length,

    write: (buf, string, offset, len) => buf.latin1Write(string, offset, len),

    slice: (buf, start, end) => buf.latin1Slice(start, end),

    indexOf: (buf, val, byteOffset, dir) =>

      indexOfString(buf, val, byteOffset, encodingsMap.latin1, dir)

  },

  ascii: {

    encoding: 'ascii',

    encodingVal: encodingsMap.ascii,

    byteLength: (string) => string.length,

    write: (buf, string, offset, len) => buf.asciiWrite(string, offset, len),

    slice: (buf, start, end) => buf.asciiSlice(start, end),

    indexOf: (buf, val, byteOffset, dir) =>

      indexOfBuffer(buf,

                    fromStringFast(val, encodingOps.ascii),

                    byteOffset,

                    encodingsMap.ascii,

                    dir)

  },

  base64: {

    encoding: 'base64',

    encodingVal: encodingsMap.base64,

    byteLength: (string) => base64ByteLength(string, string.length),

    write: (buf, string, offset, len) => buf.base64Write(string, offset, len),

    slice: (buf, start, end) => buf.base64Slice(start, end),

    indexOf: (buf, val, byteOffset, dir) =>

      indexOfBuffer(buf,

                    fromStringFast(val, encodingOps.base64),

                    byteOffset,

                    encodingsMap.base64,

                    dir)

  },

  hex: {

    encoding: 'hex',

    encodingVal: encodingsMap.hex,

    byteLength: (string) => string.length >>> 1,

    write: (buf, string, offset, len) => buf.hexWrite(string, offset, len),

    slice: (buf, start, end) => buf.hexSlice(start, end),

    indexOf: (buf, val, byteOffset, dir) =>

      indexOfBuffer(buf,

                    fromStringFast(val, encodingOps.hex),

                    byteOffset,

                    encodingsMap.hex,

                    dir)

  }

};

function getEncodingOps(encoding) {

  encoding += '';

  switch (encoding.length) {

    case 4:

      if (encoding === 'utf8') return encodingOps.utf8;

      if (encoding === 'ucs2') return encodingOps.ucs2;

      encoding = StringPrototypeToLowerCase(encoding);

      if (encoding === 'utf8') return encodingOps.utf8;

      if (encoding === 'ucs2') return encodingOps.ucs2;

      break;

    case 5:

      if (encoding === 'utf-8') return encodingOps.utf8;

      if (encoding === 'ascii') return encodingOps.ascii;

      if (encoding === 'ucs-2') return encodingOps.ucs2;

      encoding = StringPrototypeToLowerCase(encoding);

      if (encoding === 'utf-8') return encodingOps.utf8;

      if (encoding === 'ascii') return encodingOps.ascii;

      if (encoding === 'ucs-2') return encodingOps.ucs2;

      break;

    case 7:

      if (encoding === 'utf16le' ||

          StringPrototypeToLowerCase(encoding) === 'utf16le')

        return encodingOps.utf16le;

      break;

    case 8:

      if (encoding === 'utf-16le' ||

          StringPrototypeToLowerCase(encoding) === 'utf-16le')

        return encodingOps.utf16le;

      break;

    case 6:

      if (encoding === 'latin1' || encoding === 'binary')

        return encodingOps.latin1;

      if (encoding === 'base64') return encodingOps.base64;

      encoding = StringPrototypeToLowerCase(encoding);

      if (encoding === 'latin1' || encoding === 'binary')

        return encodingOps.latin1;

      if (encoding === 'base64') return encodingOps.base64;

      break;

    case 3:

      if (encoding === 'hex' || StringPrototypeToLowerCase(encoding) === 'hex')

        return encodingOps.hex;

      break;

  }

}

Copy after login

Summary

The smallest unit for storing data in a computer is a bit, but the smallest unit for storing information is a byte. The mapping relationship based on encoding and characters is realized again. Various character sets, including ascii, iso, gbk, etc., and the International Organization for Standardization proposed unicode to include all characters. There are several unicode implementation solutions: utf-8, utf-16, utf-24, and they use different characters respectively. Number of sections to store characters. Among them, utf-8 is variable length and has the smallest storage volume, so it is widely used.

Node.js stores binary data through Buffer, and when converting it to a string, you need to specify an encoding scheme. This encoding scheme not only includes character sets (charset), but also supports hex and base64 schemes, including:

utf8, ucs2, utf16le, latin1, ascii, base64, hex

We looked at the Node.js source code of encoding and found that each encoding scheme will be used to implement a series of APIs. This is a Polymorphic thoughts.

Encoding is a concept that is frequently encountered when learning Node.js, and the encoding of Node.js does not only include charset. I hope this article can help everyone understand encoding and character sets.

For more programming-related knowledge, please visit: Introduction to Programming! !

The above is the detailed content of Let's talk about encoding in Node.js Buffer. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Detailed graphic explanation of the memory and GC of the Node V8 engine Detailed graphic explanation of the memory and GC of the Node V8 engine Mar 29, 2023 pm 06:02 PM

This article will give you an in-depth understanding of the memory and garbage collector (GC) of the NodeJS V8 engine. I hope it will be helpful to you!

An article about memory control in Node An article about memory control in Node Apr 26, 2023 pm 05:37 PM

The Node service built based on non-blocking and event-driven has the advantage of low memory consumption and is very suitable for handling massive network requests. Under the premise of massive requests, issues related to "memory control" need to be considered. 1. V8’s garbage collection mechanism and memory limitations Js is controlled by the garbage collection machine

Let's talk about how to choose the best Node.js Docker image? Let's talk about how to choose the best Node.js Docker image? Dec 13, 2022 pm 08:00 PM

Choosing a Docker image for Node may seem like a trivial matter, but the size and potential vulnerabilities of the image can have a significant impact on your CI/CD process and security. So how do we choose the best Node.js Docker image?

Let's talk in depth about the File module in Node Let's talk in depth about the File module in Node Apr 24, 2023 pm 05:49 PM

The file module is an encapsulation of underlying file operations, such as file reading/writing/opening/closing/delete adding, etc. The biggest feature of the file module is that all methods provide two versions of **synchronous** and **asynchronous**, with Methods with the sync suffix are all synchronization methods, and those without are all heterogeneous methods.

Node.js 19 is officially released, let's talk about its 6 major features! Node.js 19 is officially released, let's talk about its 6 major features! Nov 16, 2022 pm 08:34 PM

Node 19 has been officially released. This article will give you a detailed explanation of the 6 major features of Node.js 19. I hope it will be helpful to you!

Let's talk about the GC (garbage collection) mechanism in Node.js Let's talk about the GC (garbage collection) mechanism in Node.js Nov 29, 2022 pm 08:44 PM

How does Node.js do GC (garbage collection)? The following article will take you through it.

Let's talk about the event loop in Node Let's talk about the event loop in Node Apr 11, 2023 pm 07:08 PM

The event loop is a fundamental part of Node.js and enables asynchronous programming by ensuring that the main thread is not blocked. Understanding the event loop is crucial to building efficient applications. The following article will give you an in-depth understanding of the event loop in Node. I hope it will be helpful to you!

What should I do if node cannot use npm command? What should I do if node cannot use npm command? Feb 08, 2023 am 10:09 AM

The reason why node cannot use the npm command is because the environment variables are not configured correctly. The solution is: 1. Open "System Properties"; 2. Find "Environment Variables" -> "System Variables", and then edit the environment variables; 3. Find the location of nodejs folder; 4. Click "OK".

See all articles