Why does the performance of moving_avg_concurrent2 not improve with the increase of concurrent execution?
moving_avg_concurrent2 splits the list into smaller pieces and uses a single goroutine to handle each piece. For some reason (it's not clear why), this function using one goroutine is faster than moving_avg_serial4, but using multiple goroutines starts to perform worse than moving_avg_serial4.
Why moving_avg_concurrent3 is much slower than moving_avg_serial4?
The performance of moving_avg_concurrent3 is worse than moving_avg_serial4 when using a goroutine. Although increasing num_goroutines can improve performance, it is still worse than moving_avg_serial4.
Even though goroutines are lightweight, they are not completely free, is it possible that the overhead incurred is so large that it is even slower than moving_avg_serial4?
Yes, although goroutines are lightweight, they are not free. When using multiple goroutines, the overhead of launching, managing, and scheduling them may outweigh the benefits of increased parallelism.
Code
Function:
// 返回包含输入移动平均值的列表(已提供,即未优化) func moving_avg_serial(input []float64, window_size int) []float64 { first_time := true var output = make([]float64, len(input)) if len(input) > 0 { var buffer = make([]float64, window_size) // 初始化缓冲区为 NaN for i := range buffer { buffer[i] = math.NaN() } for i, val := range input { old_val := buffer[int((math.Mod(float64(i), float64(window_size))))] buffer[int((math.Mod(float64(i), float64(window_size))))] = val if !NaN_in_slice(buffer) && first_time { sum := 0.0 for _, entry := range buffer { sum += entry } output[i] = sum / float64(window_size) first_time = false } else if i > 0 && !math.IsNaN(output[i-1]) && !NaN_in_slice(buffer) { output[i] = output[i-1] + (val-old_val)/float64(window_size) // 无循环的解决方案 } else { output[i] = math.NaN() } } } else { // 空输入 fmt.Println("moving_avg is panicking!") panic(fmt.Sprintf("%v", input)) } return output } // 返回包含输入移动平均值的列表 // 重新排列控制结构以利用短路求值 func moving_avg_serial4(input []float64, window_size int) []float64 { first_time := true var output = make([]float64, len(input)) if len(input) > 0 { var buffer = make([]float64, window_size) // 初始化缓冲区为 NaN for i := range buffer { buffer[i] = math.NaN() } for i := range input { // fmt.Printf("in mvg_avg4: i=%v\n", i) old_val := buffer[int((math.Mod(float64(i), float64(window_size))))] buffer[int((math.Mod(float64(i), float64(window_size))))] = input[i] if first_time && !NaN_in_slice(buffer) { sum := 0.0 for j := range buffer { sum += buffer[j] } output[i] = sum / float64(window_size) first_time = false } else if i > 0 && !math.IsNaN(output[i-1]) /* && !NaN_in_slice(buffer)*/ { output[i] = output[i-1] + (input[i]-old_val)/float64(window_size) // 无循环的解决方案 } else { output[i] = math.NaN() } } } else { // 空输入 fmt.Println("moving_avg is panicking!") panic(fmt.Sprintf("%v", input)) } return output } // 返回包含输入移动平均值的列表 // 将列表拆分为较小的片段以使用 goroutine,但不使用串行版本,即我们仅在开头具有 NaN,因此希望减少一些开销 // 仍然不能扩展(随着大小和 num_goroutines 的增加,性能下降) func moving_avg_concurrent2(input []float64, window_size, num_goroutines int) []float64 { var output = make([]float64, window_size-1, len(input)) for i := 0; i < window_size-1; i++ { output[i] = math.NaN() } if len(input) > 0 { num_items := len(input) - (window_size - 1) var barrier_wg sync.WaitGroup n := num_items / num_goroutines go_avg := make([][]float64, num_goroutines) for i := 0; i < num_goroutines; i++ { go_avg[i] = make([]float64, 0, num_goroutines) } for i := 0; i < num_goroutines; i++ { barrier_wg.Add(1) go func(go_id int) { defer barrier_wg.Done() // 计算边界 var start, stop int start = go_id*int(n) + (window_size - 1) // 开始索引 // 结束索引 if go_id != (num_goroutines - 1) { stop = start + n // 结束索引 } else { stop = num_items + (window_size - 1) // 结束索引 } loc_avg := moving_avg_serial4(input[start-(window_size-1):stop], window_size) loc_avg = make([]float64, stop-start) current_sum := 0.0 for i := start - (window_size - 1); i < start+1; i++ { current_sum += input[i] } loc_avg[0] = current_sum / float64(window_size) idx := 1 for i := start + 1; i < stop; i++ { loc_avg[idx] = loc_avg[idx-1] + (input[i]-input[i-(window_size)])/float64(window_size) idx++ } go_avg[go_id] = append(go_avg[go_id], loc_avg...) }(i) } barrier_wg.Wait() for i := 0; i < num_goroutines; i++ { output = append(output, go_avg[i]...) } } else { // 空输入 fmt.Println("moving_avg is panicking!") panic(fmt.Sprintf("%v", input)) } return output } // 返回包含输入移动平均值的列表 // 模式改变,我们选择主工作者模式并生成将由 goroutine 计算的每个窗口 func compute_window_avg(input, output []float64, start, end int) { sum := 0.0 size := end - start for _, val := range input[start:end] { sum += val } output[end-1] = sum / float64(size) } func moving_avg_concurrent3(input []float64, window_size, num_goroutines int) []float64 { var output = make([]float64, window_size-1, len(input)) for i := 0; i < window_size-1; i++ { output[i] = math.NaN() } if len(input) > 0 { num_windows := len(input) - (window_size - 1) var output = make([]float64, len(input)) for i := 0; i < window_size-1; i++ {
The above is the detailed content of Why is the performance of `moving_avg_concurrent2` not improving with increased concurrency, despite splitting the list into smaller chunks processed by individual goroutines?. For more information, please follow other related articles on the PHP Chinese website!