MongoDB tutorialIntroduction to why you should use replication sets
Recommended (free ): MongoDB Tutorial
Why use replication sets
1. Back up data
Through the built-in mongo_dump/mongo_restore
Tools can also implement backup, but after all, they are not as convenient as automatic synchronization backup of replica sets.
2. Automatic failover
A replica set is deployed. When the master node fails, the cluster will automatically vote to elect a new master node from the nodes to continue providing services. And it's all done automatically and transparently to operations and developers. Of course, if a failure occurs, you still have to deal with it manually in a timely manner. Don't rely too much on the replica set. If they all fail, you won't even have time to breathe.
3. Improve read performance in some specific scenarios
By default, both reading and writing can only be performed on the master node.
The following are the five replication set read options supported by the MongoDB client:
Source: http://docs.mongoing.com/manual-zh/core/re...
Read operations on slave nodes are not recommended, because the data on the slave node may not be the latest data (the main reason).
The scenarios for performing read operations on slave nodes are very limited. The official manual states applicable scenarios and multiple reasons why read operations on slave nodes are not recommended: http://docs.mongoing.com/manual-zh/ core/re...
Let me talk about my own opinion: replica sets do not exist to improve read performance. Except for a few scenarios, it is not recommended to perform read operations on slave nodes. If you want to improve read performance, use indexes and sharding. By the way, if the data size is not large, there is no need to use sharding. There are nearly 200 million single collection records in our online database, and the performance is relatively good (of course, the machine configuration is not bad, and only one Redis and one MongoDB are running on it).
How to deploy a replica set
Please see the manual: http://docs.mongoing.com/manual-zh/tutoria...
How to use the automatic failover feature of MongoDB replication set in a program
Take the PHP mongo driver as an example.
$client = new MongoClient('mongodb://192.168.1.2:27018,192.168.1.3:27019,192.168.1.4:27020', array('replicaSet' => 'rs0'));
After this configuration, if only one of the MongoDB services hangs up, the remaining nodes will automatically elect a new master node, and the program can continue to run normally. During the election process, the program will still throw exceptions. Although the election process is fast, for the robustness of the program, exception handling must be considered. Of course, if a new master node cannot be elected, the entire MongoDB will become unavailable. (According to the above, if the read option of the replica set is configured primaryPreferred
. If there is no primary node, but the slave node is still available, then the read operation will be transferred to the slave node, so that the entire MongoDB replication The set can also provide read operation services)
In fact, if the replica set name 'replicaSet' => 'rs0'
is specified, then even if all node addresses are not listed, only one valid For node addresses, the mongo driver will automatically obtain all valid nodes. The $client->getHosts()
method can view the addresses of all valid nodes.
But if you only write one node address and that node happens to be down, you will not be able to connect. All I recommend is configuring a complete list of node addresses.
What is the principle of synchronization?
After opening the replica set, a set called oplog.rs# will be generated under the
local library. ##, this is a limited set, that is, the size is fixed. Every write operation to the database will be recorded in this collection. The nodes in the replication set achieve data synchronization by reading the oplog on other nodes.
举个例子:
用客户端向主节点添加了 100 条记录,那么 oplog 中也会有这 100 条的 insert 记录。从节点通过获取主节点的 oplog,也执行这 100 条 oplog 记录。这样,从节点也就复制了主节点的数据,实现了同步。
需要说明的是:并不是从节点只能获取主节点的 oplog。
为了提高复制的效率,复制集中所有节点之间会互相进行心跳检测(通过ping)。每个节点都可以从任何其他节点上获取oplog。
还有,用一条语句批量删除 50 条记录,并不是在 oplog 中只记录一条数据,而是记录 50 条单条删除的记录。
oplog中的每一条操作,无论是执行一次还是多次执行,对数据集的影响结果是一样的,i.e 每条oplog中的操作都是幂等的。
什么情况下需要重新同步
在上一个问题中得知:oplog 大小是固定的,而且 oplog 里面的记录数不一定和节点中的数据量成正比。那么,新记录肯定会将前面的老记录给覆盖。
如果,有天一个从节点挂了,其他节点还在正常运行,继续有写操作,oplog 继续增长。而这个挂掉的节点一直不能从其他节点那里同步最新的 oplog 记录,当其他节点的 oplog 已经发生的覆盖。即使这个从节点后来恢复了正常,也不会和其他节点保持数据一致了。因为,覆盖的就永远回不来了。
那么,这个时候就得重新同步了。恩,回不去的就永远回不去了,再找个新的重新开始吧。(逃
如何重新同步
参见:复制集成员的重新同步
什么时候应该使用投票节点
当复制集中有偶数个节点时,应该再加一个投票节点,用于打破投票僵局。
比如:我线上共有3台服务器,其中1台是作为 Web 服务器;其余2台作为 DB 服务器,各部署了1个MongoDB节点,构成了2个节点的复制集。这个时候,我并没有多余的机器了。在这个情况下,如果任意一台 DB 服务器上的 MongoDB 挂了,那么另外一台的 MongoDB 必然变为 SECONDARY 节点,那么就意味着 MongoDB 是不可用的了。为了避免这种情况,提高服务的可用性,可以在 Web 服务器上部署一个投票节点。投票节点并不存储数据,因此不能升职为 PRIMARY 节点,它对于硬件资源要求很低,并不会对 Web 服务器上的其他程序产生太大影响。这种情况下,如果任意一台 DB 服务器挂了,另外一台服务器上的 MongoDB 将成为 PRIMARY 节点,此时 MongoDB 还是依旧对外提供服务的。乘此时机,赶紧排查出故障的那台服务器的原因,尽快恢复服务。
为了让投票节点可以占用更少的资源,可以在配置文件中添加以下几个配置项:
journal = false smallfiles = true noprealloc = true
主从复制
master-slave 复制架构已经不推荐使用了,建议使用 replica sets 复制集架构。
参见:http://docs.mongoing.com/manual-zh/core/ma...
The above is the detailed content of Let's talk about several issues about MongoDB replica sets. For more information, please follow other related articles on the PHP Chinese website!