Overview of I/O operations
File reading Writing implementation principles and operation steps
File opening mode
Python file operation step example
Python file reading related methods
File reading and writing and character encoding
I/O in computers refers to Input/Output, which is the input and output of Stream. The input and output here are relative to the memory. Input Stream (input stream) refers to the data flowing into the memory from the outside (disk, network), and Output Stream refers to the data flowing out from the memory to the outside (disk, network). When the program is running, the data resides in the memory and is executed by the ultra-fast computing core of the CPU. Where data exchange is involved (usually disk and network operations), an IO interface is required.
So who provides this IO interface? How are IO operations implemented in high-level programming languages?
The operating system is a general software program with the following general purposes:
Hardware driver
Process Management
Memory Management
Network Management
Security Management
I/O Management
The operating system shields the underlying hardware and provides a common interface upwards. Therefore, the ability to operate I/O is provided by the operating system. Every programming language encapsulates the low-level C interface provided by the operating system for developers to use, and Python is no exception.
File reading and writing is a common IO operation. Based on the above description, it can be inferred that python should also encapsulate the underlying interface of the operating system and directly provide operation methods related to file reading and writing. In fact, this is true, and so are other languages such as Java and PHP.
So what is the object we want to operate on? How do we get the object to be operated on?
Since the ability to operate I/O is provided by the operating system, and modern operating systems do not allow ordinary programs to directly operate the disk, you need to request the operating system to open an object when reading and writing files. (Often called file descriptor - file descriptor, referred to as fd), this is the file object we want to operate in the program.
Usually high-level programming languages will provide a built-in function that opens a file object by receiving parameters such as "file path" and "file opening mode", and returns the file descriptor of the file object. So through this function we can get the file object to be operated on. This built-in function is called open() in Python and fopen() in PHP.
The steps for reading and writing files in different programming languages are generally the same. are the same, and are divided into the following steps:
1)打开文件,获取文件描述符2)操作文件描述符--读/写3)关闭文件
It’s just that the APIs for reading and writing files provided by different programming languages are different. Some provide richer functions, while others are simpler.
It should be noted that:After the file read and write operation is completed, it should be closed in time. On the one hand, file objects occupy operating system resources; on the other hand, the operating system has limits on the number of file descriptors that can be opened at the same time. On the Linux operating system, you can pass ulimit -n
to view this display quantity. If the file is not closed in time, data loss may also occur. Because when I write data to a file, the operating system does not write the data to the disk immediately. Instead, it first puts the data in the memory buffer and writes it to the disk asynchronously. When the close method is called, the operating system will ensure that all data that has not been written to the disk is written to the disk, otherwise the data may be lost.
Let’s first look at the function definitions for opening files in Python, PHP and C language
# Python2open(name[, mode[, buffering]])# Python3open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
resource fopen ( string $filename , string $mode [, bool $use_include_path = false [, resource $context ]] )
int open(const char * pathname, int flags);
You will find that the parameters received by the built-in file opening methods of the above three programming languages include not only a "file path name", but also A mode parameter (the flags parameter in the open function of C language has a similar effect). The mode parameter defines the mode when opening a file. Common file opening modes are: read-only, write-only, read-write, and append-only. There are some slight differences in the definition of file opening modes in different programming languages. Let's take a look at the file opening modes in Python.
File Open Mode | Description |
---|---|
Read only Open the file in write-only mode and point the file pointer to the file header; if the file does not exist, an error will be reported | |
Open the file in write-only mode and point the file pointer to the file header; if the file exists, clear its contents, if the file does not exist, create it | |
Open the file in append-only writable mode and point the file pointer to the file Tail; if the file does not exist, create it | |
Added writable function based on r | |
Added readable function based on w | |
Added readable function based on a | |
Read and write binary files (the default is t, indicating text), which needs to be used in conjunction with the above modes, such as ab, wb, ab, ab (POSIX systems, including Linux) Ignore this character) |
Method | Description |
---|---|
Read once All contents of the file, return a str | |
Read up to the specified length of content each time, return a str; in Python2, size is specified Byte length, in Python3 size specifies the character length | |
Read all the contents of the file at once and return a list per line | |
Read only one line at a time |
Description | |
---|---|
Move the file pointer to the specified byte position | |
Get the byte position of the current file pointer |
方法 | 描述 |
---|---|
flush() | 刷新缓冲区数据,将缓冲区中的数据立刻写入文件 |
next() | 返回文件下一行,这个方法也是file对象实例可以被当做迭代器使用的原因 |
truncate([size]) | 截取文件中指定字节数的内容,并覆盖保存到文件中,如果不指定size参数则文件将被清空; Python2无返回值,Python3返回新文件的内容字节数 |
write(str) | 将字符串写入文件,没有返回值 |
writelines(sequence) | 向文件写入一个字符串或一个字符串列表,如果字符串列表中的元素需要换行要自己加入换行符 |
fileno() | 返回一个整型的文件描述符,可以用于一些底层IO操作上(如,os模块的read方法) |
isatty() | 判断文件是否被连接到一个虚拟终端,是则返回True,否则返回False |
前面已经写过一篇介绍Python中字符编码的相关文件> 里面花了很大的篇幅介绍Python中字符串与字符编码的关系以及转换过程。其中谈到过两个指定的字符编码的地方,及其作用:
PyCharm等IDE开发工具指定的项目工程和文件的字符编码: 它的主要作用是告诉Pycharm等IDE开发工具保存文件时应该将字符转换为怎样的字节表示形式,以及打开并展示文件内容时应该以什么字符编码将字节码转换为人类可识别的字符。
Python源代码文件头部指定的字符编码,如*-* coding:utf-8 -*-
: 它的主要作用是告诉Python解释器当前python代码文件保存时所使用的字符编码,Python解释器在执行代码之前,需要先从磁盘读取该代码文件中的字节然后通过这里指定的字符编码将其解码为unicode字符。Python解释器执行Python代码的过程与IDE开发工具是没有什么关联性的。
那么这里为什么又要谈起字符编码的问题呢?
或者换个问法,既然从上面已经指定了字符编码,为什么对文件进行读写时还要指定字符编码呢?从前面的描述可以看出:上面两个地方指定的是Python代码文件的字符编码,是给Python解释器和Pycharm等程序软件用的;而被读写文件的字符编码与Python代码文件的字符编码没有必然联系,读写文件时指定的字符编码是给我们写的程序软件用的。这是不同的主体和过程,希望我说明白了。
读写文件时怎样指定字符编码呢?
上面解释了读写文件为什么要指定字符编码,这里要说下怎样指定字符编码(其实这里主要讨论是读取外部数据时的情形)。这个问题其实在上面的文件读取示例中已经使用过了,这里我们再详细的说一下。
首先,再次看一下Python2和Python3中open函数的定义:
# Python2open(name[, mode[, buffering]])# Python3open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
可以看到,Python3的open函数中多了几个参数,其中包括一个encoding参数。是的,这个encoding就是用来指定被操作文件的字符编码的。
# 读操作with open('song.txt', 'r', encoding='utf-8') as f: print(f.read())# 写操作with open('song.txt', 'w', encoding='utf-8') as f: print(f.write('你好'))
那么Python2中怎样指定呢?Python2中的对文件的read和write操作都是字节,也就说Python2中文件的read相关方法读取的是字节串(如果包含中文字符,会发现len()方法的结果不等于读取到的字符个数,而是字节数)。如果我们要得到 正确的字符串,需要手动将读取到的结果decode(解码)为字符串;相反,要以特定的字符编码保存要写入的数据时,需要手动encode(编码)为字节串。这个encode()和decode()函数可以接收一个字符编码参数。Python3中read和write操作的都是字符串,实际上是Python解释器帮我们自动完成了写入时的encode(编码)和读取时的decode(解码)操作,因此我们只需要在打开文件(open函数)时指定字符编码就可以了。
# 读操作with open('song.txt', 'r') as f: print(f.read().decode('utf-8')) # 写操作with open('song2.txt', 'w') as f: # f.write(u'你好'.encode('utf-8')) # f.write('你好'.decode('utf-8').encode('utf-8')) f.write('你好')
Python3中open函数的encoding参数显然是可以不指定的,这时候就会用一个“默认字符编码”。
看下Python3中open函数文档对encoding参数的说明:
encoding is the name of the encoding used to decode or encode thefile. This should only be used in text mode. The default encoding isplatform dependent, but any encoding supported by Python can be passed. See the codecs module for the list of supported encodings.
也就是说,encoding参数的默认值是与平台有关的,比如Window上默认字符编码为GBK,Linux上默认字符编码为UTF-8。
而对于Python2来说,在进行文件写操作时,字节会被直接保存;在进行文件读操作时,如果不手动进行来decode操作自然也就用不着默认字符编码了。但是这时候在不同的字符终端打印的时候,会用当前平台的字符编码自动将字节解码为字符,此时可能会出现乱码。如song.txt文件时UTF-8编码的,在windows(字符编码为GBK)的命令行终端进行如下操作就会出现乱码:
>>> with open('song.txt', 'r') as f: ... print(f.read()) ... 鍖嗗寙閭e勾鎴戜滑 绌剁珶璇翠簡鍑犻亶 鍐嶈涔嬪悗鍐嶆嫋寤? 鍙儨璋佹湁娌℃湁 鐖辫繃涓嶆槸涓€鍦?涓冩儏涓婇潰鐨勯泟杈? 鍖嗗寙閭e勾鎴戜滑 涓€鏃跺寙蹇欐拏涓?闅句互鎵垮彈鐨勮瑷€ 鍙湁绛夊埆浜哄厬鐜
我们应该尽可能的获取被操作文件的字符编码,并明确指定encoding参数的值。
相关教程推荐:Python视频教程
The above is the detailed content of How to read and write files in python. For more information, please follow other related articles on the PHP Chinese website!