How to choose a Go file reading solution-Golang-php.cn

Table of Contents

Create files of different sizes" >Create files of different sizes

Load the entire file" >Load the entire file

逐行读取" >逐行读取

块读取" >块读取

总结" >总结

Home

Backend Development

Golang

How to choose a Go file reading solution

Go语言进阶学习

Jul 24, 2023 pm 04:02 PM

go File reading (file_get_contents, fopen, fgets, etc.)

File processing is a common problem. At the same time, Go provides a lot of file reading methods, which can easily make people suffer from difficulty in choosing. Previously, we forwarded an Super-Comprehensive Summary: 10 Ways to Read Files in Go article, which listed more than 10 reading methods. As an extension, this article takes actual files of different sizes as examples to compare their differences in detail.

Create files of different sizes

First, we need to have a comparison object. In view of the limited computer disk space, this article compares the differences in file reading at the three levels of KB, MB, and GB.

package main

import (
 "bufio"
 "math/rand"
 "os"
 "time"
)

const charset = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

var seededRand = rand.New(rand.NewSource(time.Now().UnixNano()))

func StringWithCharset(length int) string {
 b := make([]byte, length)
 for i := range b {
  b[i] = charset[seededRand.Intn(len(charset))]
 }
 return string(b)
}

func main() {
 files := map[string]int{"4KB.txt": 4, "4MB.txt": 4096, "4GB.txt": 4194304, "16GB.txt": 16777216}
 for name, number := range files {
  file, err := os.OpenFile(name, os.O_WRONLY|os.O_CREATE, 0666)
  if err != nil {
   panic(err)
  }
  write := bufio.NewWriter(file)
  for i := 0; i < number; i++ {
   s := StringWithCharset(1023) + "\n"
   write.WriteString(s)
  }
  file.Close()
 }
}

Copy after login

Executing the above code, we get files of 4KB, 4MB, 4GB, and 16GB in sequence, which are composed of the contents of random strings of 1KB per line.

$ ls -alh 4kb.txt 4MB.txt 4GB.txt 16GB.txt
-rw-r--r--  1 slp  staff    16G Mar  6 15:57 16GB.txt
-rw-r--r--  1 slp  staff   4.0G Mar  6 15:54 4GB.txt
-rw-r--r--  1 slp  staff   4.0M Mar  6 15:53 4MB.txt
-rw-r--r--  1 slp  staff   4.0K Mar  6 15:16 4kb.txt

Copy after login

Next, we use different ways to read the contents of these files.

Load the entire file

Go provides methods to read the file contents at once: os.ReadFile and ioutil.ReadFile. Starting in Go 1.16, ioutil.ReadFile is equivalent to os.ReadFile.

func BenchmarkOsReadFile4KB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  _, err := os.ReadFile("./4KB.txt")
  if err != nil {
   b.Fatal(err)
  }
 }
}

func BenchmarkOsReadFile4MB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  _, err := os.ReadFile("./4MB.txt")
  if err != nil {
   b.Fatal(err)
  }
 }
}

func BenchmarkOsReadFile4GB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  _, err := os.ReadFile("./4GB.txt")
  if err != nil {
   b.Fatal(err)
  }
 }
}

func BenchmarkOsReadFile16GB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  _, err := os.ReadFile("./16GB.txt")
  if err != nil {
   b.Fatal(err)
  }
 }
}

Copy after login

The advantages and disadvantages of one-time loading of files are very obvious. It can reduce the number of IOs, but it will load all the file contents into the memory. For large files, there is a possibility of memory overflow. risk.

逐行读取

在很多情况下，例如日志分析，对文件的处理都是按行进行的。Go 中 bufio.Reader 对象提供了一个 ReadLine() 方法，但其实我们更多地是使用 ReadBytes('\n') 或者 ReadString('\n') 代替。

// ReadLine is a low-level line-reading primitive. Most callers should use
// ReadBytes(&#39;\n&#39;) or ReadString(&#39;\n&#39;) instead or use a Scanner.

Copy after login

我们以 ReadString('\n') 为例，对 4 个文件分别进行逐行读取

func ReadLines(filename string) {
 fi, err := os.Open(filename)
 if err != nil{
  panic(err)
 }
 defer fi.Close()
 reader := bufio.NewReader(fi)
 for {
  _, err = reader.ReadString(&#39;\n&#39;)
  if err != nil {
   if err == io.EOF {
    break
   }
   panic(err)
  }
 }
}

func BenchmarkReadLines4KB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  ReadLines("./4KB.txt")
 }
}

func BenchmarkReadLines4MB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  ReadLines("./4MB.txt")
 }
}

func BenchmarkReadLines4GB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  ReadLines("./4GB.txt")
 }
}

func BenchmarkReadLines16GB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  ReadLines("./16GB.txt")
 }
}

Copy after login

块读取

块读取也称为分片读取，这也很好理解，我们可以将内容分成一块块的，每次读取指定大小的块内容。这里，我们将块大小设置为 4KB。

func ReadChunk(filename string) {
 f, err := os.Open(filename)
 if err != nil {
  panic(err)
 }
 defer f.Close()
 buf := make([]byte, 4*1024)
 r := bufio.NewReader(f)
 for {
  _, err = r.Read(buf)
  if err != nil {
   if err == io.EOF {
    break
   }
   panic(err)
  }
 }
}

func BenchmarkReadChunk4KB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  ReadChunk("./4KB.txt")
 }
}

func BenchmarkReadChunk4MB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  ReadChunk("./4MB.txt")
 }
}

func BenchmarkReadChunk4GB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  ReadChunk("./4GB.txt")
 }
}

func BenchmarkReadChunk16GB(b *testing.B) {
 for i := 0; i < b.N; i++ {
  ReadChunk("./16GB.txt")
 }
}

Copy after login

汇总结果

BenchmarkOsReadFile4KB-8           92877             12491 ns/op
BenchmarkOsReadFile4MB-8            1620            744460 ns/op
BenchmarkOsReadFile4GB-8               1        7518057733 ns/op
signal: killed

BenchmarkReadLines4KB-8            90846             13184 ns/op
BenchmarkReadLines4MB-8              493           2338170 ns/op
BenchmarkReadLines4GB-8                1        3072629047 ns/op
BenchmarkReadLines16GB-8               1        12472749187 ns/op

BenchmarkReadChunk4KB-8            99848             12262 ns/op
BenchmarkReadChunk4MB-8              913           1233216 ns/op
BenchmarkReadChunk4GB-8                1        2095515009 ns/op
BenchmarkReadChunk16GB-8               1        8547054349 ns/op

Copy after login

在本文的测试条件下（每行数据 1KB），对于小对象 4KB 的读取，三种方式差距并不大；在 MB 级别的读取中，直接加载最快，但块读取也慢不了多少；上了 GB 后，块读取方式会最快。

且有一点可以注意到的是，在整个文件加载的方式中，对于 16 GB 的文件数据（测试机器运行内存为 8GB），会内存耗尽出错，没法执行。

总结

不管是什么大小的文件，均不推荐整个文件加载的方式，因为它在小文件时的速度优势并没有那么大，相较于安全隐患，不值得选择它。

块读取是优先选择，尤其对于一些没有换行符的文件，例如音视频等。通过设定合适的块读取大小，能让速度和内存得到很好的平衡。且在读取过程中，往往伴随着处理内容的逻辑。每块内容可以赋给一个工作 goroutine 来处理，能更好地并发。

------------------- End -------------- -----

Recommended wonderful articles in previous issues:

An article teaches you the basic reflection of Go language
Structure of the basics of Go language (Winter)
An article will take you to understand the basic map of Go language

How to choose a Go file reading solution

#Welcome everyonelike,leave a message,forward,reprint,Thank you all Companion and support

If you want to join the Go study group, please reply in the background [ Join the group]

Love is always the same across thousands of rivers and mountains, can you click [在看]

The above is the detailed content of How to choose a Go file reading solution. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7612

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

How to send Go WebSocket messages? Jun 03, 2024 pm 04:53 PM

In Go, WebSocket messages can be sent using the gorilla/websocket package. Specific steps: Establish a WebSocket connection. Send a text message: Call WriteMessage(websocket.TextMessage,[]byte("Message")). Send a binary message: call WriteMessage(websocket.BinaryMessage,[]byte{1,2,3}).

In-depth understanding of Golang function life cycle and variable scope Apr 19, 2024 am 11:42 AM

In Go, the function life cycle includes definition, loading, linking, initialization, calling and returning; variable scope is divided into function level and block level. Variables within a function are visible internally, while variables within a block are only visible within the block.

How to match timestamps using regular expressions in Go? Jun 02, 2024 am 09:00 AM

In Go, you can use regular expressions to match timestamps: compile a regular expression string, such as the one used to match ISO8601 timestamps: ^\d{4}-\d{2}-\d{2}T \d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-][0-9]{2}:[0-9]{2})$ . Use the regexp.MatchString function to check if a string matches a regular expression.

The difference between Golang and Go language May 31, 2024 pm 08:10 PM

Go and the Go language are different entities with different characteristics. Go (also known as Golang) is known for its concurrency, fast compilation speed, memory management, and cross-platform advantages. Disadvantages of the Go language include a less rich ecosystem than other languages, a stricter syntax, and a lack of dynamic typing.

How to avoid memory leaks in Golang technical performance optimization? Jun 04, 2024 pm 12:27 PM

Memory leaks can cause Go program memory to continuously increase by: closing resources that are no longer in use, such as files, network connections, and database connections. Use weak references to prevent memory leaks and target objects for garbage collection when they are no longer strongly referenced. Using go coroutine, the coroutine stack memory will be automatically released when exiting to avoid memory leaks.

How to view Golang function documentation in the IDE? Apr 18, 2024 pm 03:06 PM

View Go function documentation using the IDE: Hover the cursor over the function name. Press the hotkey (GoLand: Ctrl+Q; VSCode: After installing GoExtensionPack, F1 and select "Go:ShowDocumentation").

A guide to unit testing Go concurrent functions May 03, 2024 am 10:54 AM

Unit testing concurrent functions is critical as this helps ensure their correct behavior in a concurrent environment. Fundamental principles such as mutual exclusion, synchronization, and isolation must be considered when testing concurrent functions. Concurrent functions can be unit tested by simulating, testing race conditions, and verifying results.

Things to note when Golang functions receive map parameters Jun 04, 2024 am 10:31 AM

When passing a map to a function in Go, a copy will be created by default, and modifications to the copy will not affect the original map. If you need to modify the original map, you can pass it through a pointer. Empty maps need to be handled with care, because they are technically nil pointers, and passing an empty map to a function that expects a non-empty map will cause an error.

See all articles