How to choose a Go file reading solution
Create files of different sizes
First, we need to have a comparison object. In view of the limited computer disk space, this article compares the differences in file reading at the three levels of KB, MB, and GB.
package main import ( "bufio" "math/rand" "os" "time" ) const charset = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" var seededRand = rand.New(rand.NewSource(time.Now().UnixNano())) func StringWithCharset(length int) string { b := make([]byte, length) for i := range b { b[i] = charset[seededRand.Intn(len(charset))] } return string(b) } func main() { files := map[string]int{"4KB.txt": 4, "4MB.txt": 4096, "4GB.txt": 4194304, "16GB.txt": 16777216} for name, number := range files { file, err := os.OpenFile(name, os.O_WRONLY|os.O_CREATE, 0666) if err != nil { panic(err) } write := bufio.NewWriter(file) for i := 0; i < number; i++ { s := StringWithCharset(1023) + "\n" write.WriteString(s) } file.Close() } }
Executing the above code, we get files of 4KB, 4MB, 4GB, and 16GB in sequence, which are composed of the contents of random strings of 1KB per line.
$ ls -alh 4kb.txt 4MB.txt 4GB.txt 16GB.txt -rw-r--r-- 1 slp staff 16G Mar 6 15:57 16GB.txt -rw-r--r-- 1 slp staff 4.0G Mar 6 15:54 4GB.txt -rw-r--r-- 1 slp staff 4.0M Mar 6 15:53 4MB.txt -rw-r--r-- 1 slp staff 4.0K Mar 6 15:16 4kb.txt
Next, we use different ways to read the contents of these files.
Load the entire file
Go provides methods to read the file contents at once: os.ReadFile and ioutil.ReadFile. Starting in Go 1.16, ioutil.ReadFile is equivalent to os.ReadFile.
func BenchmarkOsReadFile4KB(b *testing.B) { for i := 0; i < b.N; i++ { _, err := os.ReadFile("./4KB.txt") if err != nil { b.Fatal(err) } } } func BenchmarkOsReadFile4MB(b *testing.B) { for i := 0; i < b.N; i++ { _, err := os.ReadFile("./4MB.txt") if err != nil { b.Fatal(err) } } } func BenchmarkOsReadFile4GB(b *testing.B) { for i := 0; i < b.N; i++ { _, err := os.ReadFile("./4GB.txt") if err != nil { b.Fatal(err) } } } func BenchmarkOsReadFile16GB(b *testing.B) { for i := 0; i < b.N; i++ { _, err := os.ReadFile("./16GB.txt") if err != nil { b.Fatal(err) } } }
The advantages and disadvantages of one-time loading of files are very obvious. It can reduce the number of IOs, but it will load all the file contents into the memory. For large files, there is a possibility of memory overflow. risk.
逐行读取
在很多情况下,例如日志分析,对文件的处理都是按行进行的。Go 中 bufio.Reader 对象提供了一个 ReadLine() 方法,但其实我们更多地是使用 ReadBytes('\n') 或者 ReadString('\n') 代替。
// ReadLine is a low-level line-reading primitive. Most callers should use // ReadBytes('\n') or ReadString('\n') instead or use a Scanner.
我们以 ReadString('\n') 为例,对 4 个文件分别进行逐行读取
func ReadLines(filename string) { fi, err := os.Open(filename) if err != nil{ panic(err) } defer fi.Close() reader := bufio.NewReader(fi) for { _, err = reader.ReadString('\n') if err != nil { if err == io.EOF { break } panic(err) } } } func BenchmarkReadLines4KB(b *testing.B) { for i := 0; i < b.N; i++ { ReadLines("./4KB.txt") } } func BenchmarkReadLines4MB(b *testing.B) { for i := 0; i < b.N; i++ { ReadLines("./4MB.txt") } } func BenchmarkReadLines4GB(b *testing.B) { for i := 0; i < b.N; i++ { ReadLines("./4GB.txt") } } func BenchmarkReadLines16GB(b *testing.B) { for i := 0; i < b.N; i++ { ReadLines("./16GB.txt") } }
块读取
块读取也称为分片读取,这也很好理解,我们可以将内容分成一块块的,每次读取指定大小的块内容。这里,我们将块大小设置为 4KB。
func ReadChunk(filename string) { f, err := os.Open(filename) if err != nil { panic(err) } defer f.Close() buf := make([]byte, 4*1024) r := bufio.NewReader(f) for { _, err = r.Read(buf) if err != nil { if err == io.EOF { break } panic(err) } } } func BenchmarkReadChunk4KB(b *testing.B) { for i := 0; i < b.N; i++ { ReadChunk("./4KB.txt") } } func BenchmarkReadChunk4MB(b *testing.B) { for i := 0; i < b.N; i++ { ReadChunk("./4MB.txt") } } func BenchmarkReadChunk4GB(b *testing.B) { for i := 0; i < b.N; i++ { ReadChunk("./4GB.txt") } } func BenchmarkReadChunk16GB(b *testing.B) { for i := 0; i < b.N; i++ { ReadChunk("./16GB.txt") } }
汇总结果
BenchmarkOsReadFile4KB-8 92877 12491 ns/op BenchmarkOsReadFile4MB-8 1620 744460 ns/op BenchmarkOsReadFile4GB-8 1 7518057733 ns/op signal: killed BenchmarkReadLines4KB-8 90846 13184 ns/op BenchmarkReadLines4MB-8 493 2338170 ns/op BenchmarkReadLines4GB-8 1 3072629047 ns/op BenchmarkReadLines16GB-8 1 12472749187 ns/op BenchmarkReadChunk4KB-8 99848 12262 ns/op BenchmarkReadChunk4MB-8 913 1233216 ns/op BenchmarkReadChunk4GB-8 1 2095515009 ns/op BenchmarkReadChunk16GB-8 1 8547054349 ns/op
在本文的测试条件下(每行数据 1KB),对于小对象 4KB 的读取,三种方式差距并不大;在 MB 级别的读取中,直接加载最快,但块读取也慢不了多少;上了 GB 后,块读取方式会最快。
且有一点可以注意到的是,在整个文件加载的方式中,对于 16 GB 的文件数据(测试机器运行内存为 8GB),会内存耗尽出错,没法执行。
总结
不管是什么大小的文件,均不推荐整个文件加载的方式,因为它在小文件时的速度优势并没有那么大,相较于安全隐患,不值得选择它。
块读取是优先选择,尤其对于一些没有换行符的文件,例如音视频等。通过设定合适的块读取大小,能让速度和内存得到很好的平衡。且在读取过程中,往往伴随着处理内容的逻辑。每块内容可以赋给一个工作 goroutine 来处理,能更好地并发。
------------------- End -------------- -----
- An article teaches you the basic reflection of Go language
Structure of the basics of Go language (Winter)
An article will take you to understand the basic map of Go language

#Welcome everyonelike,leave a message,forward,reprint,Thank you all Companion and support
If you want to join the Go study group, please reply in the background [ Join the group]
Love is always the same across thousands of rivers and mountains, can you click [在看]
The above is the detailed content of How to choose a Go file reading solution. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



In Go, WebSocket messages can be sent using the gorilla/websocket package. Specific steps: Establish a WebSocket connection. Send a text message: Call WriteMessage(websocket.TextMessage,[]byte("Message")). Send a binary message: call WriteMessage(websocket.BinaryMessage,[]byte{1,2,3}).

In Go, the function life cycle includes definition, loading, linking, initialization, calling and returning; variable scope is divided into function level and block level. Variables within a function are visible internally, while variables within a block are only visible within the block.

In Go, you can use regular expressions to match timestamps: compile a regular expression string, such as the one used to match ISO8601 timestamps: ^\d{4}-\d{2}-\d{2}T \d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-][0-9]{2}:[0-9]{2})$ . Use the regexp.MatchString function to check if a string matches a regular expression.

Go and the Go language are different entities with different characteristics. Go (also known as Golang) is known for its concurrency, fast compilation speed, memory management, and cross-platform advantages. Disadvantages of the Go language include a less rich ecosystem than other languages, a stricter syntax, and a lack of dynamic typing.

Memory leaks can cause Go program memory to continuously increase by: closing resources that are no longer in use, such as files, network connections, and database connections. Use weak references to prevent memory leaks and target objects for garbage collection when they are no longer strongly referenced. Using go coroutine, the coroutine stack memory will be automatically released when exiting to avoid memory leaks.

View Go function documentation using the IDE: Hover the cursor over the function name. Press the hotkey (GoLand: Ctrl+Q; VSCode: After installing GoExtensionPack, F1 and select "Go:ShowDocumentation").

Unit testing concurrent functions is critical as this helps ensure their correct behavior in a concurrent environment. Fundamental principles such as mutual exclusion, synchronization, and isolation must be considered when testing concurrent functions. Concurrent functions can be unit tested by simulating, testing race conditions, and verifying results.

When passing a map to a function in Go, a copy will be created by default, and modifications to the copy will not affect the original map. If you need to modify the original map, you can pass it through a pointer. Empty maps need to be handled with care, because they are technically nil pointers, and passing an empty map to a function that expects a non-empty map will cause an error.
