Introduction to the method of using python to read and write files containing Chinese characters and adding specific characters at the end

高洛峰
Release: 2017-03-20 13:07:40
Original
1599 people have browsed it

In data mining, the format of the original file is often frustrating. A very important step is to organize the format of the data file.

Recently, in a project I took over, the format of the data file provided was simply unbearable. It could not be opened using pandas and always showed io error. After careful inspection, I found that many lines of data in the file ended with ". However, other lines are missing, so the need is obvious: check whether there is "" at the end of each line. If not, just add it.

It’s better to use flashback. After all, what many people need is just a quick solution, not a why. The solution is as follows:

b = open('b_file.txt', w)
with open('a_file.txt', 'r') as lines:
    for line in lines:
        line = line.strip()
        if not line.endswith(r'"'):
            line += r'"'
        line += '\n'
        b.write(line)

b.close()
a.close()
Copy after login

The key to the whole process lies in

line = line.strip()
Copy after login

Before, I was lazy and used it directly, omitting the above line. As a result, I stumbled when judging the conditions. The program thought that each line did not end with ":

if not line.endswith(r'"')
Copy after login

Bite the bullet and try it out and rewrite it. :

for line in open(data_path+'heheda.txt', 'r'):
    if not line[-2] == r'"':
        print line
        line = line[:-1] + r'"' + line[-1:]
        print line
Copy after login

At this time, the judgment condition is if not line[-2] == r'"', so that the correct result except the last line can be obtained. As we all know, in Windows systems, the carriage return character of a file is "\r\n". Therefore, when there is no strip() to handle the carriage return character, you need to manually move one byte forward at the end of each line. Determine the end of each line. As for the last line of the file, it is generally not ended with a carriage return character. After all, there is no need to wrap the line. Therefore, line[-2] is positioned in the middle of the last Chinese character, and \xx\xx is hard-written as \xx"\xx, causing the last character to be displayed incorrectly.

The above is the detailed content of Introduction to the method of using python to read and write files containing Chinese characters and adding specific characters at the end. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template