Why is the performance of `moving_avg_concurrent2` not improving with increased concurrency, despite splitting the list into smaller chunks processed by individual goroutines?-Golang-php.cn

Why is the performance of `moving_avg_concurrent2` not improving with increased concurrency, despite splitting the list into smaller chunks processed by individual goroutines?

Linda Hamilton

Release： 2024-12-23 16:38:13

Original

273 people have browsed it

Why is the performance of `moving_avg_concurrent2` not improving with increased concurrency, despite splitting the list into smaller chunks processed by individual goroutines?

Why does the performance of moving_avg_concurrent2 not improve with the increase of concurrent execution?

moving_avg_concurrent2 splits the list into smaller pieces and uses a single goroutine to handle each piece. For some reason (it's not clear why), this function using one goroutine is faster than moving_avg_serial4, but using multiple goroutines starts to perform worse than moving_avg_serial4.

Why moving_avg_concurrent3 is much slower than moving_avg_serial4?

The performance of moving_avg_concurrent3 is worse than moving_avg_serial4 when using a goroutine. Although increasing num_goroutines can improve performance, it is still worse than moving_avg_serial4.

Even though goroutines are lightweight, they are not completely free, is it possible that the overhead incurred is so large that it is even slower than moving_avg_serial4?

Yes, although goroutines are lightweight, they are not free. When using multiple goroutines, the overhead of launching, managing, and scheduling them may outweigh the benefits of increased parallelism.

Code

Function:

// 返回包含输入移动平均值的列表（已提供，即未优化）
func moving_avg_serial(input []float64, window_size int) []float64 {
    first_time := true
    var output = make([]float64, len(input))
    if len(input) > 0 {
        var buffer = make([]float64, window_size)
        // 初始化缓冲区为 NaN
        for i := range buffer {
            buffer[i] = math.NaN()
        }
        for i, val := range input {
            old_val := buffer[int((math.Mod(float64(i), float64(window_size))))]
            buffer[int((math.Mod(float64(i), float64(window_size))))] = val
            if !NaN_in_slice(buffer) && first_time {
                sum := 0.0
                for _, entry := range buffer {
                    sum += entry
                }
                output[i] = sum / float64(window_size)
                first_time = false
            } else if i > 0 && !math.IsNaN(output[i-1]) && !NaN_in_slice(buffer) {
                output[i] = output[i-1] + (val-old_val)/float64(window_size) // 无循环的解决方案
            } else {
                output[i] = math.NaN()
            }
        }
    } else { // 空输入
        fmt.Println("moving_avg is panicking!")
        panic(fmt.Sprintf("%v", input))
    }
    return output
}

// 返回包含输入移动平均值的列表
// 重新排列控制结构以利用短路求值
func moving_avg_serial4(input []float64, window_size int) []float64 {
    first_time := true
    var output = make([]float64, len(input))
    if len(input) > 0 {
        var buffer = make([]float64, window_size)
        // 初始化缓冲区为 NaN
        for i := range buffer {
            buffer[i] = math.NaN()
        }
        for i := range input {
            //            fmt.Printf("in mvg_avg4: i=%v\n", i)
            old_val := buffer[int((math.Mod(float64(i), float64(window_size))))]
            buffer[int((math.Mod(float64(i), float64(window_size))))] = input[i]
            if first_time && !NaN_in_slice(buffer) {
                sum := 0.0
                for j := range buffer {
                    sum += buffer[j]
                }
                output[i] = sum / float64(window_size)
                first_time = false
            } else if i > 0 && !math.IsNaN(output[i-1]) /* &amp;&amp; !NaN_in_slice(buffer)*/ {
                output[i] = output[i-1] + (input[i]-old_val)/float64(window_size) // 无循环的解决方案
            } else {
                output[i] = math.NaN()
            }
        }
    } else { // 空输入
        fmt.Println("moving_avg is panicking!")
        panic(fmt.Sprintf("%v", input))
    }
    return output
}

// 返回包含输入移动平均值的列表
// 将列表拆分为较小的片段以使用 goroutine，但不使用串行版本，即我们仅在开头具有 NaN，因此希望减少一些开销
// 仍然不能扩展（随着大小和 num_goroutines 的增加，性能下降）
func moving_avg_concurrent2(input []float64, window_size, num_goroutines int) []float64 {
    var output = make([]float64, window_size-1, len(input))
    for i := 0; i < window_size-1; i++ {
        output[i] = math.NaN()
    }
    if len(input) > 0 {
        num_items := len(input) - (window_size - 1)
        var barrier_wg sync.WaitGroup
        n := num_items / num_goroutines
        go_avg := make([][]float64, num_goroutines)
        for i := 0; i < num_goroutines; i++ {
            go_avg[i] = make([]float64, 0, num_goroutines)
        }

        for i := 0; i < num_goroutines; i++ {
            barrier_wg.Add(1)
            go func(go_id int) {
                defer barrier_wg.Done()

                // 计算边界
                var start, stop int
                start = go_id*int(n) + (window_size - 1) // 开始索引
                // 结束索引
                if go_id != (num_goroutines - 1) {
                    stop = start + n // 结束索引
                } else {
                    stop = num_items + (window_size - 1) // 结束索引
                }

                loc_avg := moving_avg_serial4(input[start-(window_size-1):stop], window_size)

                loc_avg = make([]float64, stop-start)
                current_sum := 0.0
                for i := start - (window_size - 1); i < start+1; i++ {
                    current_sum += input[i]
                }
                loc_avg[0] = current_sum / float64(window_size)
                idx := 1

                for i := start + 1; i < stop; i++ {
                    loc_avg[idx] = loc_avg[idx-1] + (input[i]-input[i-(window_size)])/float64(window_size)
                    idx++
                }

                go_avg[go_id] = append(go_avg[go_id], loc_avg...)

            }(i)
        }
        barrier_wg.Wait()

        for i := 0; i < num_goroutines; i++ {
            output = append(output, go_avg[i]...)
        }

    } else { // 空输入
        fmt.Println("moving_avg is panicking!")
        panic(fmt.Sprintf("%v", input))
    }
    return output
}

// 返回包含输入移动平均值的列表
// 模式改变，我们选择主工作者模式并生成将由 goroutine 计算的每个窗口
func compute_window_avg(input, output []float64, start, end int) {
    sum := 0.0
    size := end - start
    for _, val := range input[start:end] {
        sum += val
    }
    output[end-1] = sum / float64(size)
}

func moving_avg_concurrent3(input []float64, window_size, num_goroutines int) []float64 {
    var output = make([]float64, window_size-1, len(input))
    for i := 0; i < window_size-1; i++ {
        output[i] = math.NaN()
    }
    if len(input) > 0 {
        num_windows := len(input) - (window_size - 1)
        var output = make([]float64, len(input))
        for i := 0; i < window_size-1; i++ {

Copy after login

The above is the detailed content of Why is the performance of `moving_avg_concurrent2` not improving with increased concurrency, despite splitting the list into smaller chunks processed by individual goroutines?. For more information, please follow other related articles on the PHP Chinese website!