Concurrency allows us to handle multiple tasks independently from each other. Goroutines are a simple way to process multiple tasks independently. In this post we progressively enhance a http handler which accepts files an explore various concurrency patterns in Go utilizing channels and the sync package.
Before getting into concurrency patterns let’s set the stage. Imagine we have a HTTP handler that accepts multiple files via form and processes the files in some way.
func processFile(file multipart.File) { // do something with the file fmt.Println("Processing file...") time.Sleep(100 * time.Millisecond) // Simulating file processing time } func UploadHandler(w http.ResponseWriter, r *http.Request) { // limit to 10mb if err := r.ParseMultipartForm(10 << 20); err != nil { http.Error(w, "Unable to parse form", http.StatusBadRequest) return } // iterate through all files and process them sequentially for _, file := range r.MultipartForm.File["files"] { f, err := file.Open() if err != nil { http.Error(w, "Unable to read file", http.StatusInternalServerError) return } processFile(f) f.Close() } }
In the example above we receive files from a form and process them sequentially. If 10 files are uploaded it would take 1 second to complete the process and send a response to the client.
When handling many files this can become a bottleneck, however with Go's concurrency support we can easily solve this issue.
To solve this we can process files concurrently. To spawn a new goroutine we can prefix a function call with the go keyword e.g. go processFile(f). However as goroutines are non blocking the handler might return before the process is finished, leaving files possibly unprocessed or returning an incorrect state. To wait for the processing of all files we can utilize sync.WaitGroup.
A WaitGroup waits for a number of goroutines to finish, for each goroutine we spawn we additionally should increase the counter in the WaitGroup this can be done with the Add function. When a goroutine is finished Done should be called so the counter is decreased by one. Before returning from the function Wait should be called which is blocking until the counter of the WaitGroup is 0.
func UploadHandler(w http.ResponseWriter, r *http.Request) { if err := r.ParseMultipartForm(10 << 20); err != nil { http.Error(w, "Unable to parse form", http.StatusBadRequest) return } // create WaitGroup var wg sync.WaitGroup for _, file := range r.MultipartForm.File["files"] { f, err := file.Open() if err != nil { http.Error(w, "Unable to read file", http.StatusInternalServerError) return } wg.Add(1) // Add goroutine to the WaitGroup by incrementing the WaitGroup counter, this should be called before starting a goroutine // Process file concurrently go func(file multipart.File) { defer wg.Done() // decrement the counter by calling Done, utilize defer to guarantee that Done is called. defer file.Close() processFile(f) }(f) } // Wait for all goroutines to complete wg.Wait() fmt.Fprintln(w, "All files processed successfully!") }
Now for each uploaded file a new goroutine is spawned this could overwhelm the system. One solution is to limit the number of spawned goroutines.
A Semaphore is just a variable we can use to control access to common resources by multiple threads or in this case goroutines.
In Go we can utilize buffered channels to implement a semaphore.
Before getting into the implementation let's look at what channels are and the difference between buffered and unbuffered channels.
Channels are a pipe through which we can send and receive data to communicate safely between go routines.
Channels must be created with the make function.
func processFile(file multipart.File) { // do something with the file fmt.Println("Processing file...") time.Sleep(100 * time.Millisecond) // Simulating file processing time } func UploadHandler(w http.ResponseWriter, r *http.Request) { // limit to 10mb if err := r.ParseMultipartForm(10 << 20); err != nil { http.Error(w, "Unable to parse form", http.StatusBadRequest) return } // iterate through all files and process them sequentially for _, file := range r.MultipartForm.File["files"] { f, err := file.Open() if err != nil { http.Error(w, "Unable to read file", http.StatusInternalServerError) return } processFile(f) f.Close() } }
Channels have a special operator <- which is utilized to send or read from a channel.
Having the operator point at the channel ch <- 1 sends data to the channel, if the arrow points away from the channel <-ch the value will be received. Send and receive operations are blocking by default this means each operation will wait until the other side is ready.
The animation visualizes a producer sending the value 1 through an unbuffered channel and the consumer reading from the channel.
If the producer can send events faster than the consumer can handle then we have the option to utilize a buffered channel to queue up multiple messages without blocking the producer until the buffer is full. At the same time the consumer can handle the messages at its own pace.
func UploadHandler(w http.ResponseWriter, r *http.Request) { if err := r.ParseMultipartForm(10 << 20); err != nil { http.Error(w, "Unable to parse form", http.StatusBadRequest) return } // create WaitGroup var wg sync.WaitGroup for _, file := range r.MultipartForm.File["files"] { f, err := file.Open() if err != nil { http.Error(w, "Unable to read file", http.StatusInternalServerError) return } wg.Add(1) // Add goroutine to the WaitGroup by incrementing the WaitGroup counter, this should be called before starting a goroutine // Process file concurrently go func(file multipart.File) { defer wg.Done() // decrement the counter by calling Done, utilize defer to guarantee that Done is called. defer file.Close() processFile(f) }(f) } // Wait for all goroutines to complete wg.Wait() fmt.Fprintln(w, "All files processed successfully!") }
In this example the producer can send up to two items without blocking. When the capacity of the buffer is reached the producer will block until the consumer handled at least one message.
Back to the initial problem we want to limit the amount of goroutines processing files concurrently. To do this we can utilize buffered channels.
ch := make(chan int)
In this example we added a buffered channel with a capacity of 5, this allows us to process 5 files concurrently and limit strain on the system.
But what if not all files are equal? We might can reliably predict that different file types or file size require more resources to process. In this case we can utilize a weighted semaphore.
Simply put with a weighted semaphore we can assign more resources to a single tasks. Go already provides an implementation for a weighted semaphore within the extend sync package.
ch := make(chan int, 2)
In this version we created a weighted semaphore with 5 slots, if only images are uploaded for example the process handles 5 images concurrently, however if a PDF is uploaded 2 slots are acquired, which would reduce amount of files which can be handled concurrently.
We explored a few concurrency patterns in Go, utilizing sync.WaitGroup and semaphores to control the number of concurrent tasks. However there are more tools available, we could utilize channels to create a worker pool, add timeouts or use fan in/out pattern.
Additionally error handling is an important aspect which was mostly left out for simplicity.
One way to handle errors would be utilized channels to aggregate errors and handle them after all goroutines are done.
Go also provides a errgroup.Group which is related to sync.WaitGroups but adds handling of tasks which return errors.
The package can be found in the extend sync package.
The above is the detailed content of Goroutines and Channels: Concurrency Patterns in Go. For more information, please follow other related articles on the PHP Chinese website!