Unicode BOM and FileReader
When reading a UTF-8 encoded file with a Byte Order Mark (BOM), you may encounter the issue of the BOM marker being outputted along with the file content. This occurs because Unicode defines a BOM to specify the endianness of the encoded text, which can be interpreted as a character sequence if not handled properly.
In your code snippet:
To avoid the BOM marker from being included in the output:
String content = new String(Files.readAllBytes(Paths.get(file)), "UTF-8"));
if (tmp.length >= 3 && tmp[0] == (byte) 0xEF && tmp[1] == (byte) 0xBB && tmp[2] == (byte) 0xBF) { // Remove the BOM marker tmp = Arrays.copyOfRange(tmp, 3, tmp.length); }
The above is the detailed content of How to Avoid Outputting the BOM Marker When Reading a UTF-8 Encoded File?. For more information, please follow other related articles on the PHP Chinese website!