How to use js encoding function encodeURIComponent for source code analysis-JS Tutorial-php.cn

How to use js encoding function encodeURIComponent for source code analysis

一个新手

Release： 2017-09-27 10:21:01

Original

1695 people have browsed it

function URIAddEncodedOctetToBuffer(octet, result, index) {
  result[index++] = 37; // Char code of &#39;%&#39;.
  result[index++] = hexCharCodeArray[octet >> 4];
  result[index++] = hexCharCodeArray[octet & 0x0F];  return index;
}function URIEncodeOctets(octets, result, index) {
  if (hexCharCodeArray === 0) {    // 0-F
    hexCharCodeArray = [48, 49, 50, 51, 52, 53, 54, 55, 56, 57,                        65, 66, 67, 68, 69, 70];
  }
  index = URIAddEncodedOctetToBuffer(octets[0], result, index);  if (octets[1]) index = URIAddEncodedOctetToBuffer(octets[1], result, index);  if (octets[2]) index = URIAddEncodedOctetToBuffer(octets[2], result, index);  if (octets[3]) index = URIAddEncodedOctetToBuffer(octets[3], result, index);  return index;
}// 对unicode中除了代理对外的字符编码function URIEncodeSingle(cc, result, index) {
  var x = (cc >> 12) & 0xF;  var y = (cc >> 6) & 63;  var z = cc & 63;  // unicode最多值是三个字节
  var octets = new InternalArray(3);  // ascii码
  if (cc <= 0x007F) {
    octets[0] = cc;
  } else if (cc <= 0x07FF) {
    octets[0] = y + 192;
    octets[1] = z + 128;
  } else {
    octets[0] = x + 224;
    octets[1] = y + 128;
    octets[2] = z + 128;
  }  return URIEncodeOctets(octets, result, index);
}function URIEncodePair(cc1 , cc2, result, index) {
  var u = ((cc1 >> 6) & 0xF) + 1;  var w = (cc1 >> 2) & 0xF;  var x = cc1 & 3;  var y = (cc2 >> 6) & 0xF;  var z = cc2 & 63;  var octets = new InternalArray(4);
  octets[0] = (u >> 2) + 240;
  octets[1] = (((u & 3) << 4) | w) + 128;
  octets[2] = ((x << 4) | y) + 128;
  octets[3] = z + 128;  return URIEncodeOctets(octets, result, index);
}// ECMA-262, section 15.1.3function Encode(uri, unescape) {
  uri = TO_STRING(uri);  var uriLength = uri.length;  var array = new InternalArray(uriLength);  var index = 0;  for (var k = 0; k < uriLength; k++) {    var cc1 = %_StringCharCodeAt(uri, k);    if (unescape(cc1)) {      array[index++] = cc1;
    } else {      if (cc1 >= 0xDC00 && cc1 <= 0xDFFF) throw MakeURIError();      // 非高代理项
      if (cc1 < 0xD800 || cc1 > 0xDBFF) {
        index = URIEncodeSingle(cc1, array, index);
      } else {        // 高代理项
        k++;        if (k == uriLength) throw MakeURIError();        var cc2 = %_StringCharCodeAt(uri, k);        // 不是合法的低代理项
        if (cc2 < 0xDC00 || cc2 > 0xDFFF) throw MakeURIError();
        index = URIEncodePair(cc1, cc2, array, index);
      }
    }
  }  var result = %NewString(array.length, NEW_ONE_BYTE_STRING);  for (var i = 0; i < array.length; i++) {
    %_OneByteSeqStringSetChar(i, array[i], result);
  }  return result;
}// ECMA-262 - 15.1.3.4function URIEncodeComponent(component) {
  var unescapePredicate = function(cc) {
    if (isAlphaNumeric(cc)) return true;    // !
    if (cc == 33) return true;    // &#39;()*
    if (39 <= cc && cc <= 42) return true;    // -.
    if (45 <= cc && cc <= 46) return true;    // _
    if (cc == 95) return true;    // ~
    if (cc == 126) return true;    return false;
  };  return Encode(component, unescapePredicate);
}

Copy after login

URIEncodeComponent implementation process is to directly call the Encode function for encoding. First, use the unescapePredicate function to skip some special characters. These characters do not need to be encoded. See the code for details. The basic project is to traverse character by character. If the current character is a low surrogate, an error will be reported (cc1 >= 0xDC00 && cc1 <= 0xDFFF); because the low surrogate must follow the high surrogate. Then determine if the current byte is not a high surrogate (cc1 < 0xD800 || cc1 > 0xDBFF), then directly call the URIEncodeSingle function for encoding. If the current character is encoded as a high surrogate, remove the next digit. If the string length is exceeded, an error will be reported. If the next bit is not a low surrogate, an error will be reported, because the high surrogate must be followed by a low surrogate. If the next bit is a low surrogate, the URIEncodePair function is called to encode the surrogate pair. The Encode function mainly determines whether the character is a character that does not require encoding, whether it is a surrogate pair, and whether it is an ordinary character.
The basic process of the URIEncodeSingle function is to perform some kind of operation on a character, and finally call the URIEncodeOctets function to encode by bytes. The maximum unicode code is three bytes, so the array length is defined as 3 in the URIEncodeSingle function. The basic process of the URIEncodePair function is to perform some operation on the high and low surrogates. The high and low surrogates are four bytes in total, and then call the URIEncodeOctets function to encode the four bytes.

The above is the detailed content of How to use js encoding function encodeURIComponent for source code analysis. For more information, please follow other related articles on the PHP Chinese website!