Data structures and algorithms in JavaScript (5): Classic KMP algorithm

Home

Web Front-end

JS Tutorial

Data structures and algorithms in JavaScript (5): Classic KMP algorithm_javascript skills

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 16, 2016 pm 03:54 PM

javascript kmp data structure algorithm classic

KMP algorithm and BM algorithm

KMP is a classic algorithm for prefix matching and BM suffix matching. It can be seen that the difference between prefix matching and suffix matching is only in the order of comparison

Prefix matching means: the comparison of the pattern string and the parent string is from left to right, and the movement of the pattern string is also from left to right

Suffix matching means: the comparison of the pattern string and the parent string is from right to left, and the movement of the pattern string is from left to right.

Through the previous chapter it is obvious that the BF algorithm is also a prefix algorithm, but the efficiency of one-by-one matching is very arrogant. Naturally, it is not necessary to mention O(mn). KMP, which is annoying online, explains a lot. They are basically taking the high-level route and you may be confused. I tried to use my own understanding to describe it in the most down-to-earth way

KMP

KMP is also an optimized version of the prefix algorithm. The reason why it is called KMP is the abbreviation of the three names of Knuth, Morris, and Pratt. Compared with BF, the optimization point of the KMP algorithm is "the distance of each backward movement" It will dynamically adjust the moving distance of each pattern string. BF is 1 every time,

Not necessarily for KMP

As shown in the figure, the difference between BF and KMP pre-algorithm is compared

I compared the pictures and found out:

Search for the pattern string P in the text string T. When it naturally matches the sixth letter c, it is found that the second level is inconsistent. Then the BF method is to move the entire pattern string P by one place, and KMP is to move it by two places. .

We know the matching method of BF, but why does KMP move two digits instead of one or three or four digits?

Let’s explain the previous picture. The pattern string P is correct when it matches ababa, and it is wrong when it matches c. Then the idea of the KMP algorithm is: ababa is correctly matched. Information, can we use this information to not move the "search position" back to the position that has been compared, but continue to move it backward, which improves efficiency.

Then the question is, how do I know how many positions to move?

The authors of this offset algorithm KMP have summarized it for us:

Copy code The code is as follows:

Shifting digits = Number of matched characters - Corresponding partial matching value

The offset algorithm is only related to substrings, not text strings, so special attention needs to be paid here

So how do we understand the number of matched characters in the substring and the corresponding partial matching value?

Matched characters:

Copy code The code is as follows:

T : abababaabab

p : ababacb

The red mark in p is the matched character, which is easy to understand

Partial match value:

This is the core algorithm, and it is also difficult to understand

If:

Copy code The code is as follows:

T：aaronaabbcc

P：aaronaac

We can observe this text. If we make an error when matching c, where will our next move be based on the previous structure? Where is the most reasonable move?

Copy code The code is as follows:

aaronaabbcc

aaronaac

That is to say: within the pattern text, if the beginning and end of a certain paragraph of characters are the same, then this paragraph of content can be skipped during natural filtering. This idea is also reasonable

Knowing this rule, the partial matching table algorithm given is as follows:

First of all, you need to understand two concepts: "prefix" and "suffix". "Prefix" refers to all the head combinations of a string except the last character; "suffix" refers to all the tail combinations of a string except the first character.

"Partial matching value" is the length of the longest common element between "prefix" and "suffix""

Let’s take a look at aaronaac’s division if it is a BF match.

Displacement of BF: a,aa,aar,aaro,aaron,aarona,aaronaa,aaronaac

So what about the divisions of KMP? Here we need to introduce prefixes and suffixes

Let’s first look at the results of the KMP partial matching table:

Copy code The code is as follows:

a a r o n a a c

[0, 1, 0, 0, 0, 1, 2, 0]

I am definitely confused, so don’t worry, let’s break it down, prefixes and suffixes

Copy code The code is as follows:

Match string: "Aaron"

Prefix: A, Aa, Aar, Aaro

Suffix: aron,ron,on,n

Moving position: In fact, it is to compare the prefix and suffix of each matched character to see if they are equal, and then calculate the total length

Decomposition of partial matching table

The matching table algorithm in KMP, where p represents the prefix, n represents the suffix, and r represents the result

Copy code The code is as follows:

a, p=>0, n=>0 r = 0

aa, p=>[a], n=>[a], r = a.length => 1

aar, p=>[a,aa], n=>[r,ar] ,r = 0

aaro, p=>[a,aa,aar], n=>[o,ra,aro] ,r = 0

aaron p=>[a,aa,aar,aaro], n=>[n,on,ron,aron] ,r = 0

aarona, p=>[a,aa,aar,aaro,aaron], n=>[a,na,ona,rona,arona] ,r = a.lenght = 1

aaronaa, p=>[a,aa,aar,aaro,aaron,aarona], n=>[a,aa,naa,onaa,ronaa,aronaa] , r = Math.max(a.length ,aa.length) = 2

aaronaac p=>[a,aa,aar,aaro,aaron,aarona], n=>[c,ac,aac,naac,onaac,ronaac] r = 0

Similar to the BF algorithm, first decompose the position of each possible matching subscript and cache it. When matching, use this "partial matching table" to locate the number of digits that need to be moved

So the final result of aaronaac’s matching table is 0,1,0,0,0,1,2,0.

The JS version of KMP will be implemented below, there are 2 types

KMP implementation (1): KMP caching matching table

KMP implementation (2): dynamically calculate next KMP

KMP implementation (1)

Matching table

The most important thing in the KMP algorithm is the matching table. If the matching table is not needed, then it is the implementation of BF. Adding the matching table is KMP

The matching table determines the next displacement count

Based on the rules of the matching table above, we design a kmpGetStrPartMatchValue method

function kmpGetStrPartMatchValue(str) {
   var prefix = [];
   var suffix = [];
   var partMatch = [];
   for (var i = 0, j = str.length; i < j; i++) {
    var newStr = str.substring(0, i + 1);
    if (newStr.length == 1) {
     partMatch[i] = 0;
    } else {
     for (var k = 0; k < i; k++) {
      //前缀
      prefix[k] = newStr.slice(0, k + 1);
      //后缀
      suffix[k] = newStr.slice(-k - 1);
      //如果相等就计算大小,并放入结果集中
      if (prefix[k] == suffix[k]) {
       partMatch[i] = prefix[k].length;
      }
     }
     if (!partMatch[i]) {
      partMatch[i] = 0;
     }
    }
   }
   return partMatch;
  }

Copy after login

Completely in accordance with the implementation of the matching table algorithm in KMP, a->aa->aar->aaro->aaron->aarona-> is decomposed through str.substring(0, i 1) aaronaa-aaronaac

Then calculate the length of the common elements through prefix and suffix in each decomposition

Backoff Algorithm

KMP is also a front-end algorithm. You can completely transfer the BF one. The only modification is that BF directly adds 1 when backtracking. When KMP backtracks, we can calculate the next value through the matching table

//子循环
for (var j = 0; j < searchLength; j++) {
  //如果与主串匹配
  if (searchStr.charAt(j) == sourceStr.charAt(i)) {
    //如果是匹配完成
    if (j == searchLength - 1) {
     result = i - j;
     break;
    } else {
     //如果匹配到了，就继续循环，i++是用来增加主串的下标位
     i++;
    }
  } else {
   //在子串的匹配中i是被叠加了
   if (j > 1 && part[j - 1] > 0) {
    i += (i - j - part[j - 1]);
   } else {
    //移动一位
    i = (i - j)
   }
   break;
  }
}

Copy after login

The red mark is the core point of KMP. The value of next = the number of matched characters - the corresponding partial matching value

Complete KMP algorithm

<!doctype html><div id="test2"><div><script type="text/javascript">
 

  function kmpGetStrPartMatchValue(str) {
   var prefix = [];
   var suffix = [];
   var partMatch = [];
   for (var i = 0, j = str.length; i < j; i++) {
    var newStr = str.substring(0, i + 1);
    if (newStr.length == 1) {
     partMatch[i] = 0;
    } else {
     for (var k = 0; k < i; k++) {
      //取前缀
      prefix[k] = newStr.slice(0, k + 1);
      suffix[k] = newStr.slice(-k - 1);
      if (prefix[k] == suffix[k]) {
       partMatch[i] = prefix[k].length;
      }
     }
     if (!partMatch[i]) {
      partMatch[i] = 0;
     }
    }
   }
   return partMatch;
  }



function KMP(sourceStr, searchStr) {
  //生成匹配表
  var part     = kmpGetStrPartMatchValue(searchStr);
  var sourceLength = sourceStr.length;
  var searchLength = searchStr.length;
  var result;
  var i = 0;
  var j = 0;

  for (; i < sourceStr.length; i++) { //最外层循环，主串

    //子循环
    for (var j = 0; j < searchLength; j++) {
      //如果与主串匹配
      if (searchStr.charAt(j) == sourceStr.charAt(i)) {
        //如果是匹配完成
        if (j == searchLength - 1) {
         result = i - j;
         break;
        } else {
         //如果匹配到了，就继续循环，i++是用来增加主串的下标位
         i++;
        }
      } else {
       //在子串的匹配中i是被叠加了
       if (j > 1 && part[j - 1] > 0) {
        i += (i - j - part[j - 1]);
       } else {
        //移动一位
        i = (i - j)
       }
       break;
      }
    }

    if (result || result == 0) {
     break;
    }
  }


  if (result || result == 0) {
   return result
  } else {
   return -1;
  }
}

 var s = "BBC ABCDAB ABCDABCDABDE";
 var t = "ABCDABD";


 show('indexOf',function() {
  return s.indexOf(t)
 })

 show('KMP',function() {
  return KMP(s,t)
 })

 function show(bf_name,fn) {
  var myDate = +new Date()
  var r = fn();
  var div = document.createElement('div')
  div.innerHTML = bf_name +'算法,搜索位置:' + r + ",耗时" + (+new Date() - myDate) + "ms";
   document.getElementById("test2").appendChild(div);
 }


</script></div></div>

Copy after login

KMP（二）

第一种kmp的算法很明显，是通过缓存查找匹配表也就是常见的空间换时间了。那么另一种就是时时查找的算法，通过传递一个具体的完成字符串，算出这个匹配值出来，原理都一样

生成缓存表的时候是整体全部算出来的，我们现在等于只要挑其中的一条就可以了，那么只要算法定位到当然的匹配即可

next算法

function next(str) {
  var prefix = [];
  var suffix = [];
  var partMatch;
  var i = str.length
  var newStr = str.substring(0, i + 1);
  for (var k = 0; k < i; k++) {
   //取前缀
   prefix[k] = newStr.slice(0, k + 1);
   suffix[k] = newStr.slice(-k - 1);
   if (prefix[k] == suffix[k]) {
    partMatch = prefix[k].length;
   }
  }
  if (!partMatch) {
   partMatch = 0;
  }
  return partMatch;
}

Copy after login

其实跟匹配表是一样的，去掉了循环直接定位到当前已成功匹配的串了

完整的KMP.next算法

<!doctype html><div id="testnext"><div><script type="text/javascript">
 
  function next(str) {
    var prefix = [];
    var suffix = [];
    var partMatch;
    var i = str.length
    var newStr = str.substring(0, i + 1);
    for (var k = 0; k < i; k++) {
     //取前缀
     prefix[k] = newStr.slice(0, k + 1);
     suffix[k] = newStr.slice(-k - 1);
     if (prefix[k] == suffix[k]) {
      partMatch = prefix[k].length;
     }
    }
    if (!partMatch) {
     partMatch = 0;
    }
    return partMatch;
  }

  function KMP(sourceStr, searchStr) {
    var sourceLength = sourceStr.length;
    var searchLength = searchStr.length;
    var result;
    var i = 0;
    var j = 0;

    for (; i < sourceStr.length; i++) { //最外层循环，主串

      //子循环
      for (var j = 0; j < searchLength; j++) {
        //如果与主串匹配
        if (searchStr.charAt(j) == sourceStr.charAt(i)) {
          //如果是匹配完成
          if (j == searchLength - 1) {
           result = i - j;
           break;
          } else {
           //如果匹配到了，就继续循环，i++是用来增加主串的下标位
           i++;
          }
        } else {
         if (j > 1) {
          i += i - next(searchStr.slice(0,j));
         } else {
          //移动一位
          i = (i - j)
         }
         break;
        }
      }

      if (result || result == 0) {
       break;
      }
    }


    if (result || result == 0) {
     return result
    } else {
     return -1;
    }
  }

 var s = "BBC ABCDAB ABCDABCDABDE";
 var t = "ABCDAB";


  show('indexOf',function() {
   return s.indexOf(t)
  })

  show('KMP.next',function() {
   return KMP(s,t)
  })

  function show(bf_name,fn) {
   var myDate = +new Date()
   var r = fn();
   var div = document.createElement('div')
   div.innerHTML = bf_name +'算法,搜索位置:' + r + ",耗时" + (+new Date() - myDate) + "ms";
    document.getElementById("testnext").appendChild(div);
  }

</script></div></div>

Copy after login

git代码下载: https://github.com/JsAaron/data_structure

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

1 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Where to find the Crane Control Keycard in Atomfall

1 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7415

CakePHP Tutorial

1359

What is the format of the account name of steam

win11 activation key permanent

Related knowledge

Implementing Machine Learning Algorithms in C++: Common Challenges and Solutions Jun 03, 2024 pm 01:25 PM

Common challenges faced by machine learning algorithms in C++ include memory management, multi-threading, performance optimization, and maintainability. Solutions include using smart pointers, modern threading libraries, SIMD instructions and third-party libraries, as well as following coding style guidelines and using automation tools. Practical cases show how to use the Eigen library to implement linear regression algorithms, effectively manage memory and use high-performance matrix operations.

Explore the underlying principles and algorithm selection of the C++sort function Apr 02, 2024 pm 05:36 PM

The bottom layer of the C++sort function uses merge sort, its complexity is O(nlogn), and provides different sorting algorithm choices, including quick sort, heap sort and stable sort.

Compare complex data structures using Java function comparison Apr 19, 2024 pm 10:24 PM

When using complex data structures in Java, Comparator is used to provide a flexible comparison mechanism. Specific steps include: defining the comparator class, rewriting the compare method to define the comparison logic. Create a comparator instance. Use the Collections.sort method, passing in the collection and comparator instances.

Improved detection algorithm: for target detection in high-resolution optical remote sensing images Jun 06, 2024 pm 12:33 PM

01 Outlook Summary Currently, it is difficult to achieve an appropriate balance between detection efficiency and detection results. We have developed an enhanced YOLOv5 algorithm for target detection in high-resolution optical remote sensing images, using multi-layer feature pyramids, multi-detection head strategies and hybrid attention modules to improve the effect of the target detection network in optical remote sensing images. According to the SIMD data set, the mAP of the new algorithm is 2.2% better than YOLOv5 and 8.48% better than YOLOX, achieving a better balance between detection results and speed. 02 Background & Motivation With the rapid development of remote sensing technology, high-resolution optical remote sensing images have been used to describe many objects on the earth’s surface, including aircraft, cars, buildings, etc. Object detection in the interpretation of remote sensing images

Application of algorithms in the construction of 58 portrait platform May 09, 2024 am 09:01 AM

1. Background of the Construction of 58 Portraits Platform First of all, I would like to share with you the background of the construction of the 58 Portrait Platform. 1. The traditional thinking of the traditional profiling platform is no longer enough. Building a user profiling platform relies on data warehouse modeling capabilities to integrate data from multiple business lines to build accurate user portraits; it also requires data mining to understand user behavior, interests and needs, and provide algorithms. side capabilities; finally, it also needs to have data platform capabilities to efficiently store, query and share user profile data and provide profile services. The main difference between a self-built business profiling platform and a middle-office profiling platform is that the self-built profiling platform serves a single business line and can be customized on demand; the mid-office platform serves multiple business lines, has complex modeling, and provides more general capabilities. 2.58 User portraits of the background of Zhongtai portrait construction

Java data structures and algorithms: in-depth explanation May 08, 2024 pm 10:12 PM

Data structures and algorithms are the basis of Java development. This article deeply explores the key data structures (such as arrays, linked lists, trees, etc.) and algorithms (such as sorting, search, graph algorithms, etc.) in Java. These structures are illustrated through practical examples, including using arrays to store scores, linked lists to manage shopping lists, stacks to implement recursion, queues to synchronize threads, and trees and hash tables for fast search and authentication. Understanding these concepts allows you to write efficient and maintainable Java code.

PHP data structure: The balance of AVL trees, maintaining an efficient and orderly data structure Jun 03, 2024 am 09:58 AM

AVL tree is a balanced binary search tree that ensures fast and efficient data operations. To achieve balance, it performs left- and right-turn operations, adjusting subtrees that violate balance. AVL trees utilize height balancing to ensure that the height of the tree is always small relative to the number of nodes, thereby achieving logarithmic time complexity (O(logn)) search operations and maintaining the efficiency of the data structure even on large data sets.

Groundbreaking CVM algorithm solves more than 40 years of counting problems! Computer scientist flips coin to figure out unique word for 'Hamlet' Jun 07, 2024 pm 03:44 PM

Counting sounds simple, but in practice it is very difficult. Imagine you are transported to a pristine rainforest to conduct a wildlife census. Whenever you see an animal, take a photo. Digital cameras only record the total number of animals tracked, but you are interested in the number of unique animals, but there is no statistics. So what's the best way to access this unique animal population? At this point, you must be saying, start counting now and finally compare each new species from the photo to the list. However, this common counting method is sometimes not suitable for information amounts up to billions of entries. Computer scientists from the Indian Statistical Institute, UNL, and the National University of Singapore have proposed a new algorithm - CVM. It can approximate the calculation of different items in a long list.

See all articles