How do you decode UTF-8 base64 strings in JavaScript using `atob` while avoiding encoding errors?-JS Tutorial-php.cn

How do you decode UTF-8 base64 strings in JavaScript using `atob` while avoiding encoding errors?

Mary-Kate Olsen

Release： 2024-10-31 21:08:29

Original

1135 people have browsed it

How do you decode UTF-8 base64 strings in JavaScript using `atob` while avoiding encoding errors?

Using atob to decode base64 from common text sources

When using atob to decode API response strings from services that generate their output in UTF-8, you may encounter errors or broken string encodings. This is due to the limitations of JavaScript's base64 handling:

<code class="js">const notOK = "✓"
console.log(btoa(notOK)); // error</code>

Copy after login

The Unicode Problem

Even after this error was resolved in ECMAScript, the "Unicode Problem" remains, as base64 is a binary format that assumes each encoded character occupies a single byte. Many Unicode characters require more than one byte to encode, which can lead to encoding failures.

Source: MDN (2021)

<code class="js">const ok = "a";
console.log(ok.codePointAt(0).toString(16)); // 0x61: occupies 1 byte

const notOK = "✓";
console.log(notOK.codePointAt(0).toString(16)); // 0x2713: occupies 2 bytes</code>

Copy after login

Solution with binary interoperability

If you're unsure which solution to choose, this is probably the one you want. Keep scrolling for the ASCII base64 solution and history of this answer.

Consider using a binary approach by converting UTF-8 strings to binary representations and vice versa.

Encoding UTF-8 ⇢ binary

<code class="js">function toBinary(string) {
  const codeUnits = new Uint16Array(string.length);
  for (let i = 0; i < codeUnits.length; i++) {
    codeUnits[i] = string.charCodeAt(i);
  }
  return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}
encoded = toBinary("✓ à la mode") // "EycgAOAAIABsAGEAIABtAG8AZABlAA=="</code>

Copy after login

Decoding binary ⇢ UTF-8

<code class="js">function fromBinary(encoded) {
  const binary = atob(encoded);
  const bytes = new Uint8Array(binary.length);
  for (let i = 0; i < bytes.length; i++) {
    bytes[i] = binary.charCodeAt(i);
  }
  return String.fromCharCode(...new Uint16Array(bytes.buffer));
}
decoded = fromBinary(encoded) // "✓ à la mode"</code>

Copy after login

Solution with ASCII base64 interoperability

To preserve UTF-8 functionality, another approach through ASCII base64 interoperability is recommended, which rectifies "The Unicode Problem" while maintaining compatibility with text-based base64 strings.

Encoding UTF-8 ⇢ ASCII base64

<code class="js">function b64EncodeUnicode(str) {
    // Percent-encode Unicode, then convert to byte array
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
        function(match, p1) {
            return String.fromCharCode('0x' + p1);
    }));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="</code>

Copy after login

Decoding ASCII base64 ⇢ UTF-8

<code class="js">function b64DecodeUnicode(str) {
    // Convert byte array to percent-encoding, then decode
    return decodeURIComponent(atob(str).split('').map(function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
    }).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"</code>

Copy after login

TypeScript Support

<code class="ts">function b64EncodeUnicode(str) {
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
        return String.fromCharCode(parseInt(p1, 16))
    }))
}
function b64DecodeUnicode(str) {
    return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
    }).join(''))
}</code>

Copy after login

Additional Notes

White space removal may be necessary for decoding base64 strings from sources like the GitHub API on Safari.
Libraries like js-base64 and base64-js also provide reliable solutions.

The above is the detailed content of How do you decode UTF-8 base64 strings in JavaScript using `atob` while avoiding encoding errors?. For more information, please follow other related articles on the PHP Chinese website!