Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings
The window.atob() function in JavaScript doesn't correctly decode UTF-8 strings when dealing with characters that occupy more than one byte, resulting in ASCII-encoded characters instead.
Unicode Problem
JavaScript strings are encoded in 16-bit units, and btoa() expects binary data as input. Characters that occupy more than one byte, such as special characters or foreign characters, are not considered binary data and will trigger an error when passed to btoa(). This issue is known as "The Unicode Problem."
Solution with Binary Interoperability
The recommended solution by MDN involves encoding to and decoding from a binary string representation. This preserves the binary nature of the data and eliminates the Unicode Problem. The encoding process involves converting the UTF-8 string into a binary string with Uint16Array and Uint8Array. Decoding involves converting the binary string back to a UTF-8 string.
Solution with ASCII Base64 Interoperability
Another solution is to convert the UTF-16 DOMString to an 8-bit integer array of characters using Uint8Array and then encode it using btoa(). This method maintains the UTF-8 functionality and produces plain text base64 strings that can be decoded on platforms that support UTF-8. Decoding involves converting the base64 string back to a UTF-8 string using atob() and decodeURIComponent().
Deprecated Solution
A previously used solution involved using escape() and unescape() functions, which have now been deprecated. While this method still works in modern browsers, it's not recommended for use.
Additionally, it's worth noting that when working with the GitHub API, you may need to strip whitespace from the base64 source before decoding to work correctly on Mobile Safari.
The above is the detailed content of Why Does `atob()` Fail to Decode UTF-8 Strings in JavaScript?. For more information, please follow other related articles on the PHP Chinese website!