Java implementation method of specifying encoding when creating a file
This article brings you relevant knowledge about java, which mainly introduces the implementation method of Java specifying encoding when creating a file. The article introduces it in detail through sample code, which is very useful for everyone. It has certain reference and learning value when studying or working. I hope it will be helpful to everyone.
Recommended study: "java Video Tutorial"
Foreword: Recently, I learned the knowledge related to Java IO stream. I would like to Practice and consolidate the knowledge you have learned by reading and writing documents. When using the File class to create a file, I suddenly thought, how should I specify the encoding used by the file? Then I thought, how should I check the encoding of a file?
1. Problem analysis
First go to the Internet to find the answer. The results are as follows:
FileInputStream fis=new FileInputStream(“xxxx.txt”); OutputStreamWriter osw=new OutputStreamWriter(fis,“UTF-8”);
The above code probably means that when writing a file, the written characters use UTF-8 encoding is different from what I expected. I want to specify the encoding when creating the file. Like the following,
File myfile = new File("test.txt”, “UTF-8”); if (!myfile.exists()) myfile.createNewFile();
So, I went to check the official documentation of Java API 8. File does not provide a constructor that can specify the character encoding.
At the same time, other methods of accessing character encoding such as set or get are not provided, indicating that character encoding is not an inherent property of the file. Such as file creation time, file modification time, whether it is readable, writable, and executable, these are the inherent attributes of the file, or meta-information, they are part of the file.
2. Character encoding
We know that any information stored in the computer is a string of 01, and text is no exception.
The processing of characters includes two processes: Encoding and decoding
Encoding: "map" the characters to the 01 string
Decoding: 01 The string "maps" to the characters
. Different character encodings, such as GBK and UTF-8, use different rules for encoding and decoding.
For the same text string: "China", use UTF-8 encoding to save. Generally, three bytes are used to save a Chinese character (the hexadecimal form of the underlying 01 string).
Use GBK encoding to save, using two bytes to represent a Chinese character.
When we write and save the text in the text editor, the editor will "map" the text into a 01 string according to the character encoding type you set.
The character type you set is just a conversion rule for the editor to encode text into 10 strings, and is not an attribute of the text.
When the editor opens the text file, what is displayed is not the underlying 01 string, but text. This is because the editor uses a certain text encoding to decode the 01 string into characters. If, when decoding, the character encoding used is consistent or compatible with the encoding, the text can be displayed correctly. If the character encoding used during decoding is inconsistent or incompatible with the encoding, the characters will be garbled.
For example, I have a text file using GBK encoding, the content is "When will the bright moon come out",
character encoding is not an inherent attribute of the file.
I have talked so much just to illustrate this point:Character encoding is the rule used when decoding and encoding, not an inherent attribute of the file.
I can't help but wonder, why didn't the character encoding be set as part of the file attributes?Assuming it can be set and set to GBK, then the operating system needs to maintain the function. Just like a file is not writable, if a program tries to write the file, the operating system will refuse to write. The bytes that the operating system must write must meet the GBK encoding requirements. Then every time a byte is written, the operating system needs Checking the legality of the byte requires a very large performance overhead and is even impossible to implement, because some special bytes can represent either GBK or UTF-8, which is ambiguous. Now, what's the point of doing this? Is it so that the editor can select the correct encoding based on the encoding properties when opening the file? There is no need. A smart editor can infer what encoding your 01 string uses based on the first few bytes of the content. In addition, you can also manually set the character encoding used for decoding.
3. Problem Solving
When creating a file, the encoding of the file cannot be specified. When writing text to a file (for example, Ctrl S
of a text editor to save, which essentially performs a writing operation), you can choose to convert the text into an encoding rule of 01 string.
For Java programs, the code is as follows, which is the code mentioned at the beginning of the article:
FileInputStream fis=new FileInputStream(“xxxx.txt”); OutputStreamWriter osw=new OutputStreamWriter(fis,“UTF-8”);
Recommended learning: "java Video Tutorial"
The above is the detailed content of Java implementation method of specifying encoding when creating a file. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Guide to Perfect Number in Java. Here we discuss the Definition, How to check Perfect number in Java?, examples with code implementation.

Guide to Weka in Java. Here we discuss the Introduction, how to use weka java, the type of platform, and advantages with examples.

Guide to Smith Number in Java. Here we discuss the Definition, How to check smith number in Java? example with code implementation.

In this article, we have kept the most asked Java Spring Interview Questions with their detailed answers. So that you can crack the interview.

Java 8 introduces the Stream API, providing a powerful and expressive way to process data collections. However, a common question when using Stream is: How to break or return from a forEach operation? Traditional loops allow for early interruption or return, but Stream's forEach method does not directly support this method. This article will explain the reasons and explore alternative methods for implementing premature termination in Stream processing systems. Further reading: Java Stream API improvements Understand Stream forEach The forEach method is a terminal operation that performs one operation on each element in the Stream. Its design intention is

Guide to TimeStamp to Date in Java. Here we also discuss the introduction and how to convert timestamp to date in java along with examples.

Capsules are three-dimensional geometric figures, composed of a cylinder and a hemisphere at both ends. The volume of the capsule can be calculated by adding the volume of the cylinder and the volume of the hemisphere at both ends. This tutorial will discuss how to calculate the volume of a given capsule in Java using different methods. Capsule volume formula The formula for capsule volume is as follows: Capsule volume = Cylindrical volume Volume Two hemisphere volume in, r: The radius of the hemisphere. h: The height of the cylinder (excluding the hemisphere). Example 1 enter Radius = 5 units Height = 10 units Output Volume = 1570.8 cubic units explain Calculate volume using formula: Volume = π × r2 × h (4

Spring Boot simplifies the creation of robust, scalable, and production-ready Java applications, revolutionizing Java development. Its "convention over configuration" approach, inherent to the Spring ecosystem, minimizes manual setup, allo
