Home Database Mysql Tutorial 淘宝曹伟分析低成本、高性能MySQL云数据架构_MySQL

淘宝曹伟分析低成本、高性能MySQL云数据架构_MySQL

Jun 01, 2016 pm 01:39 PM
cost article

bitsCN.com

曹伟是淘宝数据库研发组的成员,前不久他在内部分享了低成本、高性能MySQL云数据的架构分析和探索,包括架构的演变过程、系统中的角色和组件等。该文章被褚霸转发在“Erlang非业余研究”上。

在一开始,曹伟指出:

虽然近两年来NoSQL的发展很快,新产品层出不穷,但在业务中应用NoSQL对开发者来说要求比较高,而MySQL拥有成熟的中间件、运维工具,已经形成一个良性的生态圈等,因此从现阶段来看,MySQL占主导性,NoSQL为辅。

接下来,曹伟介绍了他们的工作成果:

我们(阿里集团核心系统数据库团队)……设计和实现了一套UMP(Unified MySQL Platform)系统,提供低成本和高性能的MySQL云数据服务。开发者从平台上申请MySQL实例资源,通过平台提供的单一入口来访问数据,UMP 系统内部维护和管理资源池,以对用户透明的形式提供主从热备、数据备份、迁移、容灾、读写分离、分库分表等一系列服务。平台通过在一台物理机上运行多个 MySQL实例的方式来降低成本,并且实现了资源隔离,按需分配和限制CPU、内存和IO资源,同时支持不影响提供数据服务的前提下根据用户业务的发展动 态的扩容和缩容。

曹伟分析了该系统的架构演变过程:

第一版基于mysql-proxy 0.8版修复若干bug,并对proxy插件中管理用户连接和数据库连接的状态机流程进行一些修改,同时编写Lua脚本实现去中心数据库获取用户认证信息和后台数据库地址,对用户进行验证,建立到后台数据库的连接和转发数据包等逻辑。

图:UMP系统第一版架构

他提到第一版的几个问题:

  1. mysql-proxy 0.8版对多线程的支持比较简单粗暴,导致几个恶劣后果:
    • 造成“惊群”现象,多个线程被唤醒但只有一个线程需要去任务;
    • 任务的CPU亲缘性比较差,在同一个状态机上触发的事件会在多个处理器上来回切换执行;
    • mysql-proxy中还使用了全局Lua锁,同时仅允许一个工作线程执行Lua脚本,因此mysql-proxy多线程模式下的性能远不能同CPU核数保持线性增长,甚至在16核上的性能还不如4核。
    • 以上原因导致单进程模式时,一台物理机上需要部署多个进程才能有效利用机器的处理能力,但给部署、监控和服务的升级带来麻烦。
  2. 其次,限于mysql-proxy的框架,功能上不容易扩展,实现用户的连接数限制、QPS限制、以及主从切换、读写分离、分库分表等一系列功能比较困难。
  3. 最后,mysql-proxy的社区近些年来并不活跃,而且C语言对开发者功底的要求比较高,很难要求团队所有成员协同开发出兼顾优雅和正确性的代码。

因此,他们决定用Erlang重写,原因在于: 

  • 和操作系统的进程/线程相比,Erlang进程同样是并发执行的单位,但特别的轻量级,它是在Erlang虚拟机内管理和调度的“绿进程”,即用户态进程。
  • Erlang/OTP很好的抽象了开发一个分布式的、高容错性的应用程序所需的要素,包括:网络编程框架、序列化和反序列化、容错、热部署。

在设计当前的UMP系统架构时,团队遵循了以下原则:

  • 系统对外保持单一入口,对内维护单一的资源池。
  • 保证服务的高可用性,消除单点故障。
  • 保证系统是弹性可伸缩的,可以动态的增加、删减计算与存储节点。
  • 保证分配给用户的资源也是弹性可伸缩的,资源之间相互隔离。

图:UMP系统现有架构

UMP系统中有如下角色:

  • controller服务器:向UMP集群提供各种管理服务,实现元数据存储、集群成员管理、MySQL实例管理、故障恢复、备份、迁移、扩容等功能。
  • proxy服务器:向用户提供访问MySQL数据库的服务,它完全实现了MySQL协议;除数据路由的基本功能外,Proxy服务器中还实现了资源限制、屏蔽MySQL实例故障、读写分离、分库分表、记录用户访问日志的功能。
  • agent服务器:部署在运行MySQL进程的机器上,用来管理每台物理机上MySQL实例,执行创建、删除、备份、迁移、主从切换等操作,收集和分析MySQL进程的统计信息、bin log、slow query log。
  • API/Web服务器:向用户提供了系统管理界面。它们是基于开源项目Mochiweb与Chicago Boss开发的Mochiweb提供http/https服务。
  • 日志分析服务器:存储和分析Proxy服务器传入的用户访问日志,并实现了实时索引供用户查询一段时间内的慢日志和统计报表。
  • 信息统计服务器:定期将采集到的用户的连接数、QPS数值,以及MySQL实例的进程状态用RRDtool进行统计,可以画图展示到Web界面上,也可以为今后实现弹性的资源分配和自动化的MySQL实例迁移提供依据。

依赖的开源组件有:

  • Mnesia:Mnesia是OTP提供的分布式数据库,支持事务,支持透明的数据分片,利用两阶段锁实现分布式事务,可以线性扩展到至少50个节点。Mnesia更倾向于牺牲可用性来换取强一致性,但它也提供了脏读、脏写操作,可以绕过事务管理去操作数据。
  • LVS:实现负载均衡,用户应用重连后会被LVS定向到其他的proxy上。
  • RabbitMQ:提供UMP系统中各节点间的通信(不包括SQL查询、日志等大数据流的传输,这些还是直接走TCP的)
  • ZooKeeper:主要发挥配置服务器、分布式锁,以及监控所有MySQL实例的作用

对于该系统的作用,曹伟总结到:

在多个组件的协同作业下,整个系统实现了对用户透明的容灾、读写分离、分库分表功能。系统内部还通过多个小规模用户共享同一个MySQL实例,中等 规模用户独占一个MySQL实例,多个MySQL实例共享同一个物理机的方式实现资源的虚拟化,降低整体成本。在资源隔离方面,通过Cgroup限制 MySQL进程资源,以及在proxy服务器端限制QPS相结合的方法,UMP系统实现了资源虚拟化的同时保障用户的服务质量。此外,UMP系统综合运用 SSL数据库连接、数据访问IP白名单、记录用户操作日志、SQL拦截等技术保护用户的数据安全。

对于该系统的应用,曹伟指出:

UMP系统的一些组件,例如proxy服务器和日志分析服务器,目前已经运用在天猫的聚石塔平台中,为电商和ISV提供安全的数据云服务。此 外,UMP系统还运用在淘宝的店铺装修平台中,为开发者提供数据服务。下一阶段,我们希望UMP系统可以为进一步降低集团内部数据存储的成本做出贡献。

bitsCN.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How can I make money by publishing articles on Toutiao today? How to earn more income by publishing articles on Toutiao today! How can I make money by publishing articles on Toutiao today? How to earn more income by publishing articles on Toutiao today! Mar 15, 2024 pm 04:13 PM

1. How can you make money by publishing articles on Toutiao today? How to earn more income by publishing articles on Toutiao today! 1. Activate basic rights and interests: original articles can earn profits by advertising, and videos must be original in horizontal screen mode to earn profits. 2. Activate the rights of 100 fans: if the number of fans reaches 100 fans or above, you can get profits from micro headlines, original Q&A creation and Q&A. 3. Insist on original works: Original works include articles, micro headlines, questions, etc., and are required to be more than 300 words. Please note that if illegally plagiarized works are published as original works, credit points will be deducted, and even any profits will be deducted. 4. Verticality: When writing articles in professional fields, you cannot write articles across fields at will. You will not get appropriate recommendations, you will not be able to achieve the professionalism and refinement of your work, and it will be difficult to attract fans and readers. 5. Activity: high activity,

Roadmap shows the trend of AI 'replacing” human occupations Roadmap shows the trend of AI 'replacing” human occupations Jan 04, 2024 pm 04:32 PM

I saw an interesting picture yesterday that was a "level map of AI replacing human paths". As shown in the picture, the game is divided into six different levels, from E1 to E8+. We can observe from the figure that artificial intelligence (AI) is replacing human applications in different fields. The application field path of artificial intelligence is determined by its fault tolerance rate. In short, the error tolerance here refers to the cost of trial and error. AI will gradually replace industries with higher to lower error tolerance rates and gradually "replace" human occupations. In the past, we often thought that creative work relied on human thinking and was not easily replaced. However, with the development of artificial intelligence, this view does not seem to be entirely correct. Creative jobs often don’t have fixed answers

The first 100 billion model compression algorithm SparseGPT is here, reducing computing power costs while maintaining high accuracy The first 100 billion model compression algorithm SparseGPT is here, reducing computing power costs while maintaining high accuracy Apr 12, 2023 pm 01:01 PM

Since the emergence of GPT-3 in 2020, the popularity of ChatGPT has once again brought the GPT family’s generative large-scale language models into the spotlight, and they have shown strong performance in various tasks. However, the huge scale of the model also brings about increased computational costs and increased deployment difficulty. For example, the GPT‑175B model totals at least 320GB of storage in half-precision (FP16) format and requires at least five A100 GPUs with 80 GB of storage for inference. Model compression is currently a commonly used method to reduce the computational cost of large models, but so far, almost all existing

Generative AI in the cloud: Build or buy? Generative AI in the cloud: Build or buy? Dec 19, 2023 pm 08:15 PM

Compiled by David Linsigao | Products produced by Yanzheng 51CTO Technology Stack (WeChat ID: blog51cto) There is an unwritten rule in the technology field: everyone likes to use other people’s technology. But for many businesses, generative AI doesn’t seem to fit that mold. Generative AI is rapidly driving some critical decisions. Every organization faces an important choice: whether to build a custom generative AI platform in-house or buy a prepackaged solution from an AI vendor (often offered as a cloud service). DIY favors volume and opportunity. It's weird, but the reason might surprise you. They might even lead you to rethink your enterprise genAI strategy 1. Complete customization and control Rewrite the content as follows: Build a

Interpretation of Vitalik's new article: Why does Rollup, whose blob space is not used efficiently, fall into development difficulties? Interpretation of Vitalik's new article: Why does Rollup, whose blob space is not used efficiently, fall into development difficulties? Apr 01, 2024 pm 08:16 PM

How to understand @VitalikButerin’s new article’s thoughts on Ethereum’s expansion? Some people say that Vitalik’s order for Blob Inscription is outrageous. So how do blob packets work? Why is the blob space not being used efficiently after the upgrade in Cancun? DAS data availability sampling in preparation for sharding? In my opinion, the performance of Cancun is usable after the upgrade, and Vitalik is worried about the development of Rollup. Why? Next, let me talk about my understanding: As I have explained many times before, Blob is a temporary data package that is decoupled from EVM calldata and can be directly called by the consensus layer. The direct benefit is that EVM does not need to access the Blob when executing transactions. data, thus resulting in lower execution layer computations

Detailed method for sending articles and recordings at the same time via WeChat Detailed method for sending articles and recordings at the same time via WeChat Mar 26, 2024 am 09:16 AM

1. Open your phone, click on the WeChat software, and enter the WeChat home page settings. 2. Find [My] in the lower right corner of WeChat, open it, and enter the [My] page. 3. Click Collection and then open a new page.

How to add article in HTML5? How to add article in HTML5? Sep 12, 2023 am 11:37 AM

In this article, we will learn how to add articles in HTML5. One of the new segmentation elements in HTML5 is the tag. Articles are represented in HTML using tags. More specifically, the content contained within the element is different from the rest of the site's content (even though they may be related). Let us consider the following example to understand how to add an article in HTML5 Example 1 In the following example, we are using inline styles in the article element. <!DOCTYPEhtml><html><body><articlestyle="width:300px;border:2pxsolidgray;padding:

C++ program to calculate the total cost required for a robot to complete a trip in a grid C++ program to calculate the total cost required for a robot to complete a trip in a grid Aug 25, 2023 pm 04:53 PM

Suppose we have a grid of size hxw. Each cell in the grid contains a positive integer. Now there is a path finding robot placed on a specific cell (p,q) (where p is the row number and q is the column number) and it can move to the cell (i,j). The move operation has a specific cost equal to |p-i|+|q-j|. There are now q trips with the following properties. Each trip has two values ​​(x, y) and has a common value d. The robot is placed on a cell with value x and then moves to another cell with value x+d. Then it moves to another cell with value x+d+d. This process will continue until the robot reaches a cell with a value greater than or equal to y. y-x is a multiple of d

See all articles