How to Handle Byte Order Marks (BOMs) When Reading CSV Files in Java?
Byte order mark causes issues with CSV file reading in Java
Byte order mark (BOM) can be present at the beginning of some CSV files, but not in all. When present, the BOM is read along with the first line of the file, causing issues when comparing strings.
Here's how to tackle this problem:
Solution:
Implement a wrapper class, UnicodeBOMInputStream, that detects the presence of a Unicode BOM at the start of an input stream. If a BOM is detected, the skipBOM() method can be used to remove it.
Here's an example of the UnicodeBOMInputStream class:
import java.io.IOException; import java.io.InputStream; import java.io.PushbackInputStream; public class UnicodeBOMInputStream extends InputStream { private PushbackInputStream in; private BOM bom; private boolean skipped = false; public UnicodeBOMInputStream(InputStream inputStream) throws IOException { if (inputStream == null) throw new NullPointerException("Invalid input stream: null is not allowed"); in = new PushbackInputStream(inputStream, 4); byte[] bom = new byte[4]; int read = in.read(bom); switch (read) { case 4: if ((bom[0] == (byte) 0xFF) && (bom[1] == (byte) 0xFE) && (bom[2] == (byte) 0x00) && (bom[3] == (byte) 0x00)) { this.bom = BOM.UTF_32_LE; break; } else if ((bom[0] == (byte) 0x00) && (bom[1] == (byte) 0x00) && (bom[2] == (byte) 0xFE) && (bom[3] == (byte) 0xFF)) { this.bom = BOM.UTF_32_BE; break; } case 3: if ((bom[0] == (byte) 0xEF) && (bom[1] == (byte) 0xBB) && (bom[2] == (byte) 0xBF)) { this.bom = BOM.UTF_8; break; } case 2: if ((bom[0] == (byte) 0xFF) && (bom[1] == (byte) 0xFE)) { this.bom = BOM.UTF_16_LE; break; } else if ((bom[0] == (byte) 0xFE) && (bom[1] == (byte) 0xFF)) { this.bom = BOM.UTF_16_BE; break; } default: this.bom = BOM.NONE; break; } if (read > 0) in.unread(bom, 0, read); } public BOM getBOM() { return bom; } public UnicodeBOMInputStream skipBOM() throws IOException { if (!skipped) { in.skip(bom.bytes.length); skipped = true; } return this; } @Override public int read() throws IOException { return in.read(); } @Override public int read(byte[] b) throws IOException { return in.read(b, 0, b.length); } @Override public int read(byte[] b, int off, int len) throws IOException { return in.read(b, off, len); } @Override public long skip(long n) throws IOException { return in.skip(n); } @Override public int available() throws IOException { return in.available(); } @Override public void close() throws IOException { in.close(); } @Override public synchronized void mark(int readlimit) { in.mark(readlimit); } @Override public synchronized void reset() throws IOException { in.reset(); } @Override public boolean markSupported() { return in.markSupported(); } private enum BOM { NONE, UTF_8, UTF_16_LE, UTF_16_BE, UTF_32_LE, UTF_32_BE } }
Usage:
Use the UnicodeBOMInputStream wrapper as follows:
import java.io.BufferedReader; import java.io.FileInputStream; import java.io.InputStreamReader; public class CSVReaderWithBOM { public static void main(String[] args) throws Exception { FileInputStream fis = new FileInputStream("test.csv"); UnicodeBOMInputStream ubis = new UnicodeBOMInputStream(fis); System.out.println("Detected BOM: " + ubis.getBOM()); System.out.print("Reading the content of the file without skipping the BOM: "); InputStreamReader isr = new InputStreamReader(ubis); BufferedReader br = new BufferedReader(isr); System.out.println(br.readLine()); br.close(); isr.close(); ubis.close(); fis.close(); fis = new FileInputStream("test.csv"); ubis = new UnicodeBOMInputStream(fis); isr = new InputStreamReader(ubis); br = new BufferedReader(isr); ubis.skipBOM(); System.out.print("Reading the content of the file after skipping the BOM: "); System.out.println(br.readLine()); br.close(); isr.close(); ubis.close(); fis.close(); } }
This approach allows you to read CSV files with or without BOMs and avoid string comparison issues caused by the BOM being present in the first line of the file.
The above is the detailed content of How to Handle Byte Order Marks (BOMs) When Reading CSV Files in Java?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics





Troubleshooting and solutions to the company's security software that causes some applications to not function properly. Many companies will deploy security software in order to ensure internal network security. ...

Field mapping processing in system docking often encounters a difficult problem when performing system docking: how to effectively map the interface fields of system A...

When using MyBatis-Plus or other ORM frameworks for database operations, it is often necessary to construct query conditions based on the attribute name of the entity class. If you manually every time...

Start Spring using IntelliJIDEAUltimate version...

Solutions to convert names to numbers to implement sorting In many application scenarios, users may need to sort in groups, especially in one...

Conversion of Java Objects and Arrays: In-depth discussion of the risks and correct methods of cast type conversion Many Java beginners will encounter the conversion of an object into an array...

How to convert names to numbers to implement sorting within groups? When sorting users in groups, it is often necessary to convert the user's name into numbers so that it can be different...

How does the Redis caching solution realize the requirements of product ranking list? During the development process, we often need to deal with the requirements of rankings, such as displaying a...
