Use JavaScript to calculate the number of bytes occupied by storing a string in UTF-8

Use JavaScript to calculate the number of bytes occupied by storing a string in UTF-8_javascript skills

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Release： 2016-05-16 17:26:22

Original

1832 people have browsed it

I have been struggling with JavaScript recently.

I encountered such a problem. The character set of the database is UTF-8. You need to use JavaScript on the page to verify that the input text occupies the byte length when stored in UTF-8. The String object of JavaScript has a length attribute, but this calculation is the number of characters, not the number of bytes (the problem always comes up over and over again. I remember when I was playing with Delphi, I had to write a program to calculate the number of characters in a string, because String in Delphi length is the number of bytes...). A lazy way is to set the maximum length in the verification code to 1/3 of the length of the corresponding field in the database, but this is a bit inappropriate to be precise.

So I want to find a way to determine the number of bytes of String stored in UTF-8 in JavaScript. I found many documents about Unicode introduction on the Internet. The most important thing is the storage length corresponding to the character encoding value:

UCS-2 encoding (hexadecimal) UTF-8 byte stream (binary)
0000 - 007F 0xxxxxxx (1 byte)
0080 - 07FF 110xxxxx 10xxxxxx (2 bytes)
0800 - FFFF 1110xxxx 10xxxxxx 10xxxxxx (3 bytes)

So the code is as follows:
[

Copy code The code is as follows :

 
function mbStringLength(s) { 
var totalLength = 0; 
var i; 
var charCode; 
for (i = 0; i < s. length; i ) { 
charCode = s.charCodeAt(i); 
if (charCode < 0x007f) { 
totalLength = totalLength 1; 
} else if ((0x0080 <= charCode) && (charCode <= 0x07ff)) { 
totalLength = 2; 
} else if ((0x0800 <= charCode) && (charCode <= 0xffff)) { 
totalLength = 3; 
} 
} 
//alert(totalLength); 
return totalLength; 
} 

In fact, characters between 0x0080 and 0x07ff are rarely used in actual Used in user input.