不久前,我对 pt-online-schema-change 的插件接口进行了一些更改,它允许编写自定义复制检查。在添加此功能时,我还向 pt-table-checksum 添加了 --plugin 选项。这是在 Percona Toolkit 2.2.8 中发布的。
通过这些添加,我花了一些时间编写了一个插件,允许 Percona Toolkit 工具使用 Tungsten Replicator 来检查从机延迟,您可以在 https://github.com/grypyrg/percona- 找到代码toolkit-plugin-tungsten-replicator
该插件使用 perl JSON::XS 模块 ( perl-JSON-XS rpm 包,http://search.cpan.org/dist/ JSON-XS/XS.pm ),确保它可用,否则插件将无法工作。
我们需要使用 --recursion-method=dsns 因为 Percona Toolkit 工具无法自动找到连接到主数据库的 tungsten 复制器从属。 (我确实在启动板上添加了一个蓝图,以使其成为可能 https://blueprints.launchpad.net/percona-toolkit/spec/plugin-custom-recursion-method )
dsns递归方法从您指定的数据库表中获取从属列表:
CREATE TABLE `percona`.`dsns` (`id` int(11) NOT NULL AUTO_INCREMENT,`parent_id` int(11) DEFAULT NULL,`dsn` varchar(255) NOT NULL,PRIMARY KEY (`id`));
` id ` int ( 11 ) NOT NULL AUTO_INCRMENT , `parent_id ` int ( 11 ) 默认为 NULL ,node1 mysql> select * from percona.dsns;+----+-----------+---------+| id | parent_id | dsn |+----+-----------+---------+|2 |NULL | h=node3 |+----+-----------+---------+ 登录后复制
主键 ( ` id ` ) ) ; ## CONFIGURATION# trepctl command to runmy $trepctl="/opt/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/trepctl";# what tungsten replicator service to checkmy $service="bravo";# what user does tungsten replicator use to perform the writes?# See Binlog Format for more informationmy $tungstenusername = 'tungsten'; 登录后复制 |
## CONFIGURATION # trepctl command to run my $trepctl = "/opt/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/trepctl" ; # what tungsten replicator service to check my $service = "bravo" ; # what user does tungsten replicator use to perform the writes? # See Binlog Format for more information my $tungstenusername = 'tungsten' ; |
# pt-table-checksum - u checksum -- no - check - binlog - format -- recursion - method = dsn = D = percona , t = dsns -- plugin = / vagrant / pt - plugin - tungsten_replicator .pl -- databases app -- check - interval = 5 -- max - lag = 10 Created plugin from / vagrant / pt - plugin - tungsten_replicator .pl . PLUGIN get_slave_lag : Using Tungsten Replicator to check replication lag Tungsten Replicator status of host node3 is OFFLINE : NORMAL , waiting Tungsten Replicator status of host node3 is OFFLINE : NORMAL , waiting Replica node3 is stopped . Waiting . Tungsten Replicator status of host node3 is OFFLINE : NORMAL , waiting Replica lag is 125 seconds on node3 . Waiting . Replica lag is 119 seconds on node3 . Waiting . Checksumming app .large_table : 22 % 00 : 12 remain TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 07 - 03T10 : 49 : 54 0 0 2097152 7 0 213.238 app .large_table
目前,使用 Percona Toolkit 无法为插件指定额外选项,因此仍然需要手动编辑 perl 文件来配置它。因此,在运行校验和之前,我们需要配置插件: node1 mysql>;从 percona .dsns 选择 * ;
# pt-table-checksum-u checksum--no-check-binlog-format--recursion-method=dsn=D=percona,t=dsns--plugin=/vagrant/pt-plugin-tungsten_replicator.pl--databases app--check-interval=5--max-lag=10Created plugin from /vagrant/pt-plugin-tungsten_replicator.pl.PLUGIN get_slave_lag: Using Tungsten Replicator to check replication lagTungsten Replicator status of host node3 is OFFLINE:NORMAL, waitingTungsten Replicator status of host node3 is OFFLINE:NORMAL, waitingReplica node3 is stopped.Waiting.Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waitingReplica lag is 125 seconds on node3.Waiting.Replica lag is 119 seconds on node3.Waiting.Checksumming app.large_table:22% 00:12 remainTS ERRORSDIFFS ROWSCHUNKS SKIPPEDTIME TABLE07-03T10:49:54002097152 7 0 213.238 app.large_table
|编号 |父 ID | DSN | -- -- -- -- -- -- -- -- - -- -- -- -- -- - | 2 | 空| h = 节点3 | -- -- -- -- -- -- -- -- - -- -- -- -- -- -
运行校验和 在这里,我使用 pt-table-checksum 对表进行了校验和。在校验和过程中,我使从节点离线并再次将其重新上线: ## 配置 # trepctl 命令运行 my $trepctl = "/opt/tungsten/installs /cookbook/tungsten/tungsten-replicator/bin/trepctl" ; # 要检查什么钨复制器服务 my $service = "bravo" ; # tungstenreplicator 使用什么用户来执行写入? # 有关更多信息,请参阅 Binlog 格式 my $tungstenusername = 'tungsten' ;
# pt-table-checksum - u checksum -- 无 - 检查 - binlog - 格式 -- 递归 - method = dsn = D = percona , t = dsns -- plugin = / vagrant / pt - 插件 - tungsten_replicator .pl -- 数据库应用 -- 检查 - 间隔 = 5 -- max - lag = 10 已创建来自 /vagrant/pt-plugin-tungsten_replicator.pl 的插件。 PLUGIN get_slave_lag :使用 Tungsten Replicator 检查复制延迟 主机节点 3 的 Tungsten Replicator 状态为离线:正常,等待 主机节点 3 的 Tungsten Replicator 状态为离线:正常,等待 副本节点 3 已停止。 等待 。 主机节点 3 的 Tungsten Replicator 状态为 OFFLINE:NORMAL,正在等待 节点 3 上的副本延迟为 125 秒。 等待 。 Node3 上的副本延迟为 119 秒。 等待 。 校验和应用程序 .large_table : 22 % 00 : 12 剩余 TS 错误 差异 行 跳过块 时间表 07 - 03T10 : 49 : 54 0 0 2097152 7 0 213.238应用程序 .large_table
I recommend to change the check-interval higher than the default 1 second as running trepctl takes a while. This could slow down the process quite a lot.
The plugin also works with pt-online-schema-change :
# pt-online-schema-change-u schemachange--recursion-method=dsn=D=percona,t=dsns--plugin=/vagrant/pt-plugin-tungsten_replicator.pl--check-interval=5--max-lag=10--alter "add index (column1) "--execute D=app,t=large_tableCreated plugin from /vagrant/pt-plugin-tungsten_replicator.pl.Found 1 slaves:node3Will check slave lag on:node3PLUGIN get_slave_lag: Using Tungsten Replicator to check replication lagOperation, tries, wait:copy_rows, 10, 0.25create_triggers, 10, 1drop_triggers, 10, 1swap_tables, 10, 1update_foreign_keys, 10, 1Altering `app`.`large_table`...Creating new table...Created new table app._large_table_new OK.Waiting forever for new table `app`.`_large_table_new` to replicate to node3...Altering new table...Altered `app`.`_large_table_new` OK.2014-07-03T13:02:33 Creating triggers...2014-07-03T13:02:33 Created triggers OK.2014-07-03T13:02:33 Copying approximately 8774670 rows...Copying `app`.`large_table`:26% 01:21 remainCopying `app`.`large_table`:50% 00:59 remainReplica lag is 12 seconds on node3.Waiting.Replica lag is 12 seconds on node3.Waiting.Copying `app`.`large_table`:53% 02:22 remainCopying `app`.`large_table`:82% 00:39 remain2014-07-03T13:06:06 Copied rows OK.2014-07-03T13:06:06 Swapping tables...2014-07-03T13:06:06 Swapped original and new tables OK.2014-07-03T13:06:06 Dropping old table...2014-07-03T13:06:06 Dropped old table `app`.`_large_table_old` OK.2014-07-03T13:06:06 Dropping triggers...2014-07-03T13:06:06 Dropped triggers OK.Successfully altered `app`.`large_table`.
# pt-online-schema-change - u schemachange -- recursion - method = dsn = D = percona , t = dsns -- plugin = / vagrant / pt - plugin - tungsten_replicator .pl -- check - interval = 5 -- max - lag = 10 -- alter "add index (column1) " -- execute D = app , t = large_table Created plugin from / vagrant / pt - plugin - tungsten_replicator .pl . Found 1 slaves : node3 Will check slave lag on : node3 PLUGIN get_slave_lag : Using Tungsten Replicator to check replication lag Operation , tries , wait : copy_rows , 10 , 0.25 create_triggers , 10 , 1 drop_triggers , 10 , 1 swap_tables , 10 , 1 update_foreign_keys , 10 , 1 Altering ` app ` . ` large_table ` . . . Creating new table . . . Created new table app ._large_table_new OK . Waiting forever for new table ` app ` . ` _large_table_new ` to replicate to node3 . . . Altering new table . . . Altered ` app ` . ` _large_table_new ` OK . 2014 - 07 - 03T13 : 02 : 33 Creating triggers . . . 2014 - 07 - 03T13 : 02 : 33 Created triggers OK . 2014 - 07 - 03T13 : 02 : 33 Copying approximately 8774670 rows . . . Copying ` app ` . ` large_table ` : 26 % 01 : 21 remain Copying ` app ` . ` large_table ` : 50 % 00 : 59 remain Replica lag is 12 seconds on node3 . Waiting . Replica lag is 12 seconds on node3 . Waiting . Copying ` app ` . ` large_table ` : 53 % 02 : 22 remain Copying ` app ` . ` large_table ` : 82 % 00 : 39 remain 2014 - 07 - 03T13 : 06 : 06 Copied rows OK . 2014 - 07 - 03T13 : 06 : 06 Swapping tables . . . 2014 - 07 - 03T13 : 06 : 06 Swapped original and new tables OK . 2014 - 07 - 03T13 : 06 : 06 Dropping old table . . . 2014 - 07 - 03T13 : 06 : 06 Dropped old table ` app ` . ` _large_table_old ` OK . 2014 - 07 - 03T13 : 06 : 06 Dropping triggers . . . 2014 - 07 - 03T13 : 06 : 06 Dropped triggers OK . Successfully altered ` app ` . ` large_table ` . |
As you can see, there was some slave lag during the schema changes.
pt-online-schema-change uses triggers in order to do the schema changes. Tungsten Replicator has some limitations with different binary log formats and triggers ( https://code.google.com/p/tungsten-replicator/wiki/TRCAdministration#Triggers_and_Row_Replication ).
In Tungsten Replicator, ROW based binlog events will be converted to SQL statements, which causes triggers to be executed on the slave as well, this does not happen with traditional replication.
Different settings:
Error creating --plugin: The master it's binlog_format=MIXED,pt-online-schema change does not work well withTungsten Replicator and binlog_format=MIXED.
Error creating -- plugin : The master it ' s binlog_format = MIXED , pt - online - schema change does not work well with Tungsten Replicator and binlog_format = MIXED . |
binlog_format 可以在每个会话的基础上覆盖,确保使用 pt-online-schema-change 时不会发生这种情况。
Continent 网站上的文档已经提到了如何将数据与 pt-table-checksum 进行比较。
我相信这个插件是一个很好的补充。 Percona Toolkit 中监控复制延迟的功能现在可以与 Tungsten Replicator 一起使用,因此您可以控制使用这些工具时可以容忍的复制延迟程度。