首页 > 后端开发 > Golang > 正文

如何将 Kubernetes 支持的领导者选举添加到您的 Go 应用程序中

WBOY
发布: 2024-07-20 09:15:39
原创
906 人浏览过

How to add Kubernetes-powered leader election to your Go apps

最初由博客发布

Kubernetes 标准库充满了宝石,隐藏在生态系统中的许多不同的子包中。我最近发现了一个这样的例子 k8s.io/client-go/tools/leaderelection,它可用于向 Kubernetes 集群内运行的任何应用程序添加领导者选举协议。本文将讨论什么是领导者选举,它是如何在这个 Kubernetes 包中实现的,并提供一个示例来说明如何在我们自己的应用程序中使用这个库。

领导人选举

领导者选举是一个分布式系统概念,是高可用软件的核心构建块。它允许多个并发进程相互协调并选举一个“领导者”进程,然后该进程负责执行同步操作,例如写入数据存储。

这在分布式数据库或缓存等系统中非常有用,在这些系统中,多个进程正在运行以针对硬件或网络故障创建冗余,但无法同时写入存储以确保数据一致性。如果领导者进程在未来某个时刻变得无响应,剩余进程将启动新的领导者选举,最终选择一个新进程作为领导者。

利用这个概念,我们可以创建具有单个领导者和多个备用副本的高可用软件。

在 Kubernetes 中,controller-runtime 包使用领导者选举来使控制器具有高可用性。在控制器部署中,仅当进程是领导者并且其他副本处于等待状态时才会发生资源协调。如果 Leader Pod 没有响应,剩余的副本将选举一个新的 Leader 来执行后续协调并恢复正常运行。

Kubernetes 租赁

该库使用 Kubernetes Lease 或分布式锁,可以由进程获取。租约是由单一身份在给定期限内持有的原生 Kubernetes 资源,并具有续订选项。 以下是文档中的示例规范:

apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  labels:
    apiserver.kubernetes.io/identity: kube-apiserver
    kubernetes.io/hostname: master-1
  name: apiserver-07a5ea9b9b072c4a5f3d1c3702
  namespace: kube-system
spec:
  holderIdentity: apiserver-07a5ea9b9b072c4a5f3d1c3702_0c8914f7-0f35-440e-8676-7844977d3a05
  leaseDurationSeconds: 3600
  renewTime: "2023-07-04T21:58:48.065888Z"
登录后复制

k8s 生态系统通过三种方式使用租约:

  1. 节点心跳:每个节点都有相应的Lease资源,并不断更新其renewTime字段。如果 Lease 的 renewTime 一段时间没有更新,该 Node 将被污染为不可用,并且不会再为其调度 Pod。
  2. Leader Election:在这种情况下,Lease 用于通过让 Leader 更新 Lease 的holderIdentity 来协调多个进程。具有不同身份的备用副本陷入等待租约到期的状态。如果租约确实到期,并且领导者没有续订,则会进行新的选举,其中剩余的副本尝试通过用自己的持有人身份更新其持有者身份来获得租约的所有权。由于 Kubernetes API 服务器不允许更新过时的对象,因此只有一个备用节点能够成功更新租约,此时它将作为新的领导者继续执行。
  3. API 服务器身份:从 v1.26 开始,作为测试功能,每个 kube-apiserver 副本将通过创建专用租约来发布其身份。由于这是一个相对较小的新功能,因此除了运行的 API 服务器数量之外,无法从 Lease 对象派生出太多其他内容。但这确实为未来的 k8s 版本中的这些 Lease 添加更多元数据留下了空间。

现在让我们通过编写示例程序来探索租赁的第二个用例,以演示如何在领导者选举场景中使用它们。

示例程序

在此代码示例中,我们使用 Leaderelection 包来处理领导者选举和租约操作细节。

package main

import (
    "context"
    "fmt"
    "os"
    "time"

    "k8s.io/client-go/tools/leaderelection"
    rl "k8s.io/client-go/tools/leaderelection/resourcelock"
    ctrl "sigs.k8s.io/controller-runtime"
)

var (
    // lockName and lockNamespace need to be shared across all running instances
    lockName      = "my-lock"
    lockNamespace = "default"

    // identity is unique to the individual process. This will not work for anything,
    // outside of a toy example, since processes running in different containers or
    // computers can share the same pid.
    identity      = fmt.Sprintf("%d", os.Getpid())
)

func main() {
    // Get the active kubernetes context
    cfg, err := ctrl.GetConfig()
    if err != nil {
        panic(err.Error())
    }

    // Create a new lock. This will be used to create a Lease resource in the cluster.
    l, err := rl.NewFromKubeconfig(
        rl.LeasesResourceLock,
        lockNamespace,
        lockName,
        rl.ResourceLockConfig{
            Identity: identity,
        },
        cfg,
        time.Second*10,
    )
    if err != nil {
        panic(err)
    }

    // Create a new leader election configuration with a 15 second lease duration.
    // Visit https://pkg.go.dev/k8s.io/client-go/tools/leaderelection#LeaderElectionConfig
    // for more information on the LeaderElectionConfig struct fields
    el, err := leaderelection.NewLeaderElector(leaderelection.LeaderElectionConfig{
        Lock:          l,
        LeaseDuration: time.Second * 15,
        RenewDeadline: time.Second * 10,
        RetryPeriod:   time.Second * 2,
        Name:          lockName,
        Callbacks: leaderelection.LeaderCallbacks{
            OnStartedLeading: func(ctx context.Context) { println("I am the leader!") },
            OnStoppedLeading: func() { println("I am not the leader anymore!") },
            OnNewLeader:      func(identity string) { fmt.Printf("the leader is %s\n", identity) },
        },
    })
    if err != nil {
        panic(err)
    }

    // Begin the leader election process. This will block.
    el.Run(context.Background())

}
登录后复制

leaderelection 包的优点在于它提供了一个基于回调的框架来处理领导者选举。这样,您可以以精细的方式对特定的状态变化采取行动,并在选举新领导者时适当地释放资源。通过在单独的 goroutine 中运行这些回调,该包利用 Go 强大的并发支持来有效地利用机器资源。

测试一下

为了测试这一点,让我们使用 kind 启动一个测试集群。

$ kind create cluster
登录后复制

将示例代码复制到 main.go 中,创建一个新模块(go mod init Leaderelectiontest)并整理它(go mod tidy)以安装其依赖项。运行 go run main.go 后,您应该看到如下输出:

$ go run main.go
I0716 11:43:50.337947     138 leaderelection.go:250] attempting to acquire leader lease default/my-lock...
I0716 11:43:50.351264     138 leaderelection.go:260] successfully acquired lease default/my-lock
the leader is 138
I am the leader!
登录后复制

The exact leader identity will be different from what's in the example (138), since this is just the PID of the process that was running on my computer at the time of writing.

And here's the Lease that was created in the test cluster:

$ kubectl describe lease/my-lock
Name:         my-lock
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  coordination.k8s.io/v1
Kind:         Lease
Metadata:
  Creation Timestamp:  2024-07-16T15:43:50Z
  Resource Version:    613
  UID:                 1d978362-69c5-43e9-af13-7b319dd452a6
Spec:
  Acquire Time:            2024-07-16T15:43:50.338049Z
  Holder Identity:         138
  Lease Duration Seconds:  15
  Lease Transitions:       0
  Renew Time:              2024-07-16T15:45:31.122956Z
Events:                    <none>
登录后复制

See that the "Holder Identity" is the same as the process's PID, 138.

Now, let's open up another terminal and run the same main.go file in a separate process:

$ go run main.go
I0716 11:48:34.489953     604 leaderelection.go:250] attempting to acquire leader lease default/my-lock...
the leader is 138
登录后复制

This second process will wait forever, until the first one is not responsive. Let's kill the first process and wait around 15 seconds. Now that the first process is not renewing its claim on the Lease, the .spec.renewTime field won't be updated anymore. This will eventually cause the second process to trigger a new leader election, since the Lease's renew time is older than its duration. Because this process is the only one now running, it will elect itself as the new leader.

the leader is 604
I0716 11:48:51.904732     604 leaderelection.go:260] successfully acquired lease default/my-lock
I am the leader!
登录后复制

If there were multiple processes still running after the initial leader exited, the first process to acquire the Lease would be the new leader, and the rest would continue to be on standby.

No single-leader guarantees

This package is not foolproof, in that it "does not guarantee that only one client is acting as a leader (a.k.a. fencing)". For example, if a leader is paused and lets its Lease expire, another standby replica will acquire the Lease. Then, once the original leader resumes execution, it will think that it's still the leader and continue doing work alongside the newly-elected leader. In this way, you can end up with two leaders running simultaneously.

To fix this, a fencing token which references the Lease needs to be included in each request to the server. A fencing token is effectively an integer that increases by 1 every time a Lease changes hands. So a client with an old fencing token will have its requests rejected by the server. In this scenario, if an old leader wakes up from sleep and a new leader has already incremented the fencing token, all of the old leader's requests would be rejected because it is sending an older (smaller) token than what the server has seen from the newer leader.

Implementing fencing in Kubernetes would be difficult without modifying the core API server to account for corresponding fencing tokens for each Lease. However, the risk of having multiple leader controllers is somewhat mitigated by the k8s API server itself. Because updates to stale objects are rejected, only controllers with the most up-to-date version of an object can modify it. So while we could have multiple controller leaders running, a resource's state would never regress to older versions if a controller misses a change made by another leader. Instead, reconciliation time would increase as both leaders need to refresh their own internal states of resources to ensure that they are acting on the most recent versions.

Still, if you're using this package to implement leader election using a different data store, this is an important caveat to be aware of.

Conclusion

Leader election and distributed locking are critical building blocks of distributed systems. When trying to build fault-tolerant and highly-available applications, having tools like these at your disposal is critical. The Kubernetes standard library gives us a battle-tested wrapper around its primitives to allow application developers to easily build leader election into their own applications.

While use of this particular library does limit you to deploying your application on Kubernetes, that seems to be the way the world is going recently. If in fact that is a dealbreaker, you can of course fork the library and modify it to work against any ACID-compliant and highly-available datastore.

Stay tuned for more k8s source deep dives!

以上是如何将 Kubernetes 支持的领导者选举添加到您的 Go 应用程序中的详细内容。更多信息请关注PHP中文网其他相关文章!

来源:dev.to
本站声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
热门教程
更多>
最新下载
更多>
网站特效
网站源码
网站素材
前端模板
关于我们 免责声明 Sitemap
PHP中文网:公益在线PHP培训,帮助PHP学习者快速成长!