Home Database Mysql Tutorial 浅谈SQL Server查询优化器中的JOIN算法

浅谈SQL Server查询优化器中的JOIN算法

Jun 07, 2016 pm 03:04 PM
join server sql optimization Inquire algorithm

查询优化器 都是支持 JOIN 操作的,而 SQL Server 中主要有以下三类JOIN算法:Nested Loop、Sort-Merge以及Hash Join。尽管每种算法都并不是很复杂,但考虑到性能优化,在产品级的优化器实现时往往使用的是改进过的变种算法。譬如 SQL Server 支持block nest

  查询优化器都是支持JOIN操作的,而SQL Server 中主要有以下三类JOIN算法:Nested Loop、Sort-Merge以及Hash Join。尽管每种算法都并不是很复杂,但考虑到性能优化,在产品级的优化器实现时往往使用的是改进过的变种算法。譬如SQL Server 支持block nested loops、index nexted loops、sort-merge、hash join以及hash team。我们在这里只对上述三种基本算法的原型做一个简单的介绍。

  【假设】有两张表R和S,R共占有M页,S共占有N页。r 和 s 分别代表元组,而 i 和 j 分别代表第i或者第 j 个字段,也就是后文提到的JOIN字段。

  1. Nested Loop Join(嵌套循环联结)

  算法:

  其思路相当的简单和直接:对于关系R的每个元组 r 将其与关系S的每个元组 s 在JOIN条件的字段上直接比较并筛选出符合条件的元组。写成伪代码就是:

  foreach tuple r Î R do
  foreach tuple s Î S do
  if ri == sj then add to result

  代价:

  被联结的表所处内层或外层的顺序对磁盘I/O开销有着非常重要的影响。而CPU开销相对来说影响较小,主要是元组读入内存以后(in-memory)的开销,是 O (n * m)

  对于I/O开销,根据 page-at-a-time 的前提条件,I/O cost = M + M * N,翻译一下就是 I/O的开销 = 读取M页的I/O开销 + M次读取N页的I/O开销。

  使用小结:

  • 适用于一个集合大而另一个集合小的情况(将小集合做为外循环),I/O性能不错。

  • 当外循环输入相当小而内循环非常大且有索引建立在JOIN字段上时,I/O性能相当不错。

  • 当两个集合中只有一个在JOIN字段上建立索引时,一定要将该集合作为内循环。

  • 对于一对一的匹配关系(两个具有唯一约束字段的联结),可以在找到匹配元组后跳过该次内循环的剩余部分(类似于编程语言循环语句中的continue)。

  2. Sort-Merge Join (排序合并联结)

  Nested Loop一般在两个集合都很大的情况下效率就相当差了,而Sort-Merge在这种情况下就比它要高效不少,尤其是当两个集合的JOIN字段上都有聚集索引(clustered index)存在时,Sort-Merge性能将达到最好。

  算法:

  基本思路也很简单(复习一下数据结构中的合并排序吧),主要有两个步骤:

  (1) 按JOIN字段进行排序

  (2) 对两组已排序集合进行合并排序,从来源端各自取得数据列后加以比较(需要根据是否在JOIN字段有重复值做特殊的“分区”处理)

  代价:(主要是I/O开销)

  有两个因素左右Sort-Merge的开销:JOIN字段是否已排序 以及 JOIN字段上的重复值有多少。

  • 最好情况下(两列都已排序且至少有一列没有重复值):O (n + m) 只需要对两个集合各扫描一遍

  • 最差情况下(两列都未排序且两列上的所有值都相同):O (n * log n + m * log m + n * m) 两次排序以及一次全部元组间的笛卡尔乘积

使用小结:

  如前所述,可以考虑在两个结果集都很大情况下使用,最好能有聚集索引保证已经排序完毕。而在实际应用中,我们经常会与遇到的主键-外键关系就是Sort-Merge的一个很好的应用。这种情况下,一般两列都会有聚集索引(已排序)而且一对多的关系保证了至少有一列没有重复值,这种情况下,Sort-Merge的性能是三种里面最好的。

  另外,如果要求查询的SQL语法本身就要求GROUP BY、ORDER BY、CUBE等运行,则查询语法整体本来就要做排序,因此可以重用排序结果,此时Merge也是不错的选择。

  3. Hash Join (哈希联结)

  Hash Join在本质上类似于两列都有重复值时的Sort-Merge的处理思想――分区(patitioning)。但它们也有区别:Hash Join通过哈希来分区(每一个桶就是一个分区)而Sort-Merge通过排序来分区(每一个重复值就是一个分区)。

  值得注意的是,Hash Join与上述两种算法之间的较大区别同时也是一个较大限制是它只能应用于等值联结(equality join),这主要是由于哈希函数及其桶的确定性及无序性所导致的。

  算法:

  基本的Hash Join算法由以下两步组成:

  (1) Build Input Phase: 基于JOIN字段,使用哈希函数h2为较小的S集合构建内存中(in-memory)的哈希表,相同键值的以linked list组成一个桶(bucket)

  (2) Probe Input Phase: 在较大的R集合上对哈希表进行核对以完成联结。其中核对操作包括:

  foreach tuple r Î R do
  hash on the joining attribute using the hash function of step 1 to find a bucket in the hash table
  if the bucket is nonempty
  foreach tuple s in the found bucket
  if ri == sj then add to result

  代价:

  值得注意的是对于大集合R的每个元组 r ,hash bucket中对应 r 的那个bucket中的每个元组都需要与 r 进行比较,这也是算法最耗时的地方所在。

  CPU开销是O (m + n * b) b是每个bucket的平均元组数量。

  使用小结:

  一般来说,查询优化器会首先考虑Nested Loop和Sort-Merge,但如果两个集合量都不小且没有合适的索引时,才会考虑使用Hash Join。

  Hash Join也用于许多集合比较操作,inner join、left/right/full outer join、intersect、difference等等,当然了,需要保证都是等值联结。

  另外,Hash Join的变种能够移除重复和进行分组,它只使用一个输入,兼做Build和Probe的角色。

   其实产品级的优化器一般都改进了这些基本算法,而改进过的版本的确有较大的性能提升。在这里只是给需要判断执行计划优劣或者研究查询优化器实现的兄弟提供原理方面的介绍,在实际应用中我们还得结合丰富的statistics作出准确的判断。


Tech?Ed 2007 微软技术大会

点击查看 Tech?Ed 2007 微软技术大会 专题

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What is the difference between HQL and SQL in Hibernate framework? What is the difference between HQL and SQL in Hibernate framework? Apr 17, 2024 pm 02:57 PM

HQL and SQL are compared in the Hibernate framework: HQL (1. Object-oriented syntax, 2. Database-independent queries, 3. Type safety), while SQL directly operates the database (1. Database-independent standards, 2. Complex executable queries and data manipulation).

How to check your academic qualifications on Xuexin.com How to check your academic qualifications on Xuexin.com Mar 28, 2024 pm 04:31 PM

How to check my academic qualifications on Xuexin.com? You can check your academic qualifications on Xuexin.com, but many users don’t know how to check their academic qualifications on Xuexin.com. Next, the editor brings you a graphic tutorial on how to check your academic qualifications on Xuexin.com. Interested users come and take a look! Xuexin.com usage tutorial: How to check your academic qualifications on Xuexin.com 1. Xuexin.com entrance: https://www.chsi.com.cn/ 2. Website query: Step 1: Click on the Xuexin.com address above to enter the homepage Click [Education Query]; Step 2: On the latest webpage, click [Query] as shown by the arrow in the figure below; Step 3: Then click [Login Academic Credit File] on the new page; Step 4: On the login page Enter the information and click [Login];

12306 How to check historical ticket purchase records How to check historical ticket purchase records 12306 How to check historical ticket purchase records How to check historical ticket purchase records Mar 28, 2024 pm 03:11 PM

Download the latest version of 12306 ticket booking app. It is a travel ticket purchasing software that everyone is very satisfied with. It is very convenient to go wherever you want. There are many ticket sources provided in the software. You only need to pass real-name authentication to purchase tickets online. All users You can easily buy travel tickets and air tickets and enjoy different discounts. You can also start booking reservations in advance to grab tickets. You can book hotels or special car transfers. With it, you can go where you want to go and buy tickets with one click. Traveling is simpler and more convenient, making everyone's travel experience more comfortable. Now the editor details it online Provides 12306 users with a way to view historical ticket purchase records. 1. Open Railway 12306, click My in the lower right corner, and click My Order 2. Click Paid on the order page. 3. On the paid page

Implementing Machine Learning Algorithms in C++: Common Challenges and Solutions Implementing Machine Learning Algorithms in C++: Common Challenges and Solutions Jun 03, 2024 pm 01:25 PM

Common challenges faced by machine learning algorithms in C++ include memory management, multi-threading, performance optimization, and maintainability. Solutions include using smart pointers, modern threading libraries, SIMD instructions and third-party libraries, as well as following coding style guidelines and using automation tools. Practical cases show how to use the Eigen library to implement linear regression algorithms, effectively manage memory and use high-performance matrix operations.

Explore the underlying principles and algorithm selection of the C++sort function Explore the underlying principles and algorithm selection of the C++sort function Apr 02, 2024 pm 05:36 PM

The bottom layer of the C++sort function uses merge sort, its complexity is O(nlogn), and provides different sorting algorithm choices, including quick sort, heap sort and stable sort.

Improved detection algorithm: for target detection in high-resolution optical remote sensing images Improved detection algorithm: for target detection in high-resolution optical remote sensing images Jun 06, 2024 pm 12:33 PM

01 Outlook Summary Currently, it is difficult to achieve an appropriate balance between detection efficiency and detection results. We have developed an enhanced YOLOv5 algorithm for target detection in high-resolution optical remote sensing images, using multi-layer feature pyramids, multi-detection head strategies and hybrid attention modules to improve the effect of the target detection network in optical remote sensing images. According to the SIMD data set, the mAP of the new algorithm is 2.2% better than YOLOv5 and 8.48% better than YOLOX, achieving a better balance between detection results and speed. 02 Background & Motivation With the rapid development of remote sensing technology, high-resolution optical remote sensing images have been used to describe many objects on the earth’s surface, including aircraft, cars, buildings, etc. Object detection in the interpretation of remote sensing images

C++ program optimization: time complexity reduction techniques C++ program optimization: time complexity reduction techniques Jun 01, 2024 am 11:19 AM

Time complexity measures the execution time of an algorithm relative to the size of the input. Tips for reducing the time complexity of C++ programs include: choosing appropriate containers (such as vector, list) to optimize data storage and management. Utilize efficient algorithms such as quick sort to reduce computation time. Eliminate multiple operations to reduce double counting. Use conditional branches to avoid unnecessary calculations. Optimize linear search by using faster algorithms such as binary search.

Application of algorithms in the construction of 58 portrait platform Application of algorithms in the construction of 58 portrait platform May 09, 2024 am 09:01 AM

1. Background of the Construction of 58 Portraits Platform First of all, I would like to share with you the background of the construction of the 58 Portrait Platform. 1. The traditional thinking of the traditional profiling platform is no longer enough. Building a user profiling platform relies on data warehouse modeling capabilities to integrate data from multiple business lines to build accurate user portraits; it also requires data mining to understand user behavior, interests and needs, and provide algorithms. side capabilities; finally, it also needs to have data platform capabilities to efficiently store, query and share user profile data and provide profile services. The main difference between a self-built business profiling platform and a middle-office profiling platform is that the self-built profiling platform serves a single business line and can be customized on demand; the mid-office platform serves multiple business lines, has complex modeling, and provides more general capabilities. 2.58 User portraits of the background of Zhongtai portrait construction

See all articles