Home > Java > javaTutorial > How to Avoid Outputting the BOM Marker When Reading a UTF-8 Encoded File?

How to Avoid Outputting the BOM Marker When Reading a UTF-8 Encoded File?

Mary-Kate Olsen
Release: 2024-11-16 22:43:03
Original
335 people have browsed it

How to Avoid Outputting the BOM Marker When Reading a UTF-8 Encoded File?

Unicode BOM and FileReader

When reading a UTF-8 encoded file with a Byte Order Mark (BOM), you may encounter the issue of the BOM marker being outputted along with the file content. This occurs because Unicode defines a BOM to specify the endianness of the encoded text, which can be interpreted as a character sequence if not handled properly.

In your code snippet:

  • fr and br are used to read the file as bytes and convert them into characters.
  • tmp reads each line of the file as a byte array.
  • text converts the byte array into a UTF-8 encoded string.
  • content concatenates the lines of the file, including the BOM marker as it is part of the file's content.

To avoid the BOM marker from being included in the output:

  1. Read the file as a String, not as a byte array. This skips the need to convert bytes to characters, avoiding the BOM issue.
String content = new String(Files.readAllBytes(Paths.get(file)), "UTF-8"));
Copy after login
  1. If you must read the file as a byte array, you can manually remove the BOM marker before converting it to a string. The BOM marker is a three-byte sequence:
if (tmp.length >= 3 &&
    tmp[0] == (byte) 0xEF &&
    tmp[1] == (byte) 0xBB &&
    tmp[2] == (byte) 0xBF) {

    // Remove the BOM marker
    tmp = Arrays.copyOfRange(tmp, 3, tmp.length);
}
Copy after login

The above is the detailed content of How to Avoid Outputting the BOM Marker When Reading a UTF-8 Encoded File?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template