Home > Java > javaTutorial > body text

Detailed explanation of edit distance examples of strings

零下一度
Release: 2017-06-30 09:34:25
Original
2078 people have browsed it

Dynamic programming algorithm questions are often frequent questions in written examinations of major companies. In many algorithmic WeChat public accounts, articles about "dynamic programming" are common. They all try to describe and explain dynamic programming in the most understandable words, and some even use comics to explain it. Read every public article carefully. In fact, all the articles pushed by the number can be read and understood, and everyone can have a general understanding of dynamic programming.

 What is dynamic programming? In layman’s terms, the solution to a problem can be known at a glance (exhaustive list), but you can’t count them one by one. You have to find the optimal solution. In other words, there will be questions like “the most” and “the most” in the question. "At least", "how many kinds are there in total" and other formulations, these problems can theoretically be solved using the idea of ​​dynamic programming. Dynamic programming is similar to the divide-and-conquer method. It solves the original problem by combining the solutions of sub-problems, but it only solves each sub-problem once and saves it in a table without recalculation. It is usually used to solve the most optimal problem. Optimization Problem——"Introduction to Algorithms".

Edit distance (Edit Distance), in this article refers to Levenshtein distance, which is the string S1The minimum number of times that can be converted into a string S2 through the three operations of inserting , modifying, and deleting. For example: S1 = abc, S2 = abf, edit distance d = 1 (only Need to change c to f). In this article, the algorithm idea of ​​dynamic programming will be used to solve the edit distance of strings.

Definition: S1, S2 represents two strings , S1(i) represents S The first character of 1, d[i, j] represents the th character of S1 #i prefix to the jth prefix of S2 (for example :S1 = ” abc”,S2 = ”def”,Solve the edit distance from S1 to S2 as d[ 3, 3]).

  1. If

    S1 = “abc”, S2 = “dec”, then their edit distance is d[3, 3] = 2 , observe that the last character of the two strings is the same, that is to say, S1(3) = S2(3) does not require any transformation, so S1 = "abc", S2 = "dec" <= > S1' = ”ab”, S2' = ”de”, that is, when S1[i] = S[j], , d [i, j] = d[i-1,j -1]. Get the formula: d[i, j] = d[i - 1, j - 1] (S1[i] = S2[j])

  2. The above article yields the calculation formula when S1[i] = S2[j]. Obviously there is another situation where S1[i] ≠ S2[j], if S1 = “abc”, S2 = “def”. The process of converting S1 to S2 can be "modified", But you can also delete # by "INSERT"" ##” causes S1 to be transformed into S2.

  1) Insert the

character “f”# at the end of the S1 string ##, at this time S1 = ”abcf”, S2 = ”def”,At this time that is, S1[i] = S2[j] In the case of , the edit distance when S1 is transformed into S2 is d[4, 3] = d[ 3, 2]. So we getd[i, j]=d[i, j - 1] + 1. (+1 is because S1 has added ”f”)  2 )Insert the character

“c” at the end of the S2 string, then S1 = “abc” , S2 = ”defc”, this is the situation where S1[i] = S[j],# The edit distance transformed from ##S1 to S2 is d[3, 4] = d[2, 3] . So we getd[i, j]=d[i - 1, j] + 1. In fact, this is done to S1 delete. (+1 is because S2 has added ”c”) 3 )Modify S1

at the end of the string to ”f”, at this time S1 = ”abf”S2 = ”def”, at this time S1[i] = S[j] In the case of , the edit distance to transform S1 into S2 is d[3, 3] = d[2, 2]. So we getd[i, j] = d[i – 1, j - 1] + 1. (+1 is because S1 modified “c”) In summary, we get the recursion formula:

##=>

## You might as well use a table to express the dynamic programming The solution process of

S1=

”abc”, S2=”def”.

It can be seen that the red square is the final edit distance required, and the entire solution process is to fill up this table

— -

Two-dimensional array. The following are the dynamic programming solutions to the string edit distance of Java and Python respectively.

 

Java

  1 package com.algorithm.dynamicprogramming;  2   3 
  /**  4  * 动态规划——字符串的编辑距离  5  * s1 = "abc", s2 = "def"  6  
  * 计算公式:  7  *          | 0                                          
   i = 0, j = 0  8  *          | j                                          
    i = 0, j > 0  9  * d[i,j] = | i                                          
     i > 0, j = 0 10  *          | min(d[i,j-1]+1, d[i-1,j]+1, d[i-1,j-1])    s1(i) = s2(j) 11  
     *          | min(d[i,j-1]+1, d[i-1,j]+1, d[i-1,j-1]+1)  s1(i) ≠ s2(j) 12  * 定义二维数组[4][4]: 13 
      *      d e f            d e f 14  *   |x|x|x|x|        |0|1|2|3| 15  
      * a |x|x|x|x|  =>  a |1|1|2|3|  => 编辑距离d = [3][3] = 3 16  * b |x|x|x|x|      
      b |2|2|2|3| 17  * c |x|x|x|x|      c |3|3|3|3| 18  * 19  * Created by yulinfeng on 6/29/17. 20  
      */ 21 public class Levenshtein { 22  23     public static void main(String[] args) { 24         
      String s1 = "abc"; 25         String s2 = "def"; 26         int editDistance = levenshtein(s1, s2); 27         
      System.out.println("s1=" + s1 + "与s2=" + s2 + "的编辑距离为:" + editDistance); 28     } 29  30     /** 31      
      * 编辑距离求解 32      * @param s1 字符串s1 33      * @param s2 字符串s2 34      * @return 编辑距离 35      
      */ 36     private static int levenshtein(String s1, String s2) { 37         int i = 0;  //s1字符串中的字符下标 38      
         int j = 0;  //s2字符串中的字符下标 39         char s1i = 0;   //s1字符串第i个字符 40         
         char s2j = 0;   //s2字符串第j个字符 41         int m = s1.length();    //s1字符串长度 42         
         int n = s2.length();    //s2字符串长度 43         if (m == 0) {   
         //s1字符串长度为0,此时的编辑距离就是s2字符串长度 44             return n; 45         } 46         
         if (n == 0) { 47             return m;   //s2字符串长度为0,此时的编辑距离就是s1字符串长度 48         }          
         int[][] solutionMatrix = new int[m + 1][n + 1];     //求解矩阵 50         
         /** 51          *      
         d e f 52         *   |0|x|x|x| 53         
          * a |1|x|x|x| 54         
           * b |2|x|x|x| 55          
         * c |3|x|x|x| 56    */ 57         for (i = 0; i < m + 1; i++) { 58             solutionMatrix[i][0] = i; 59         } 60         
         /** 61          *      d e f 62         
          *   |0|1|2|3| 63          
          * a |x|x|x|x| 64          
          * b |x|x|x|x| 65          
         * c |x|x|x|x| 66          */ 67         for (j = 0; j < n + 1; j++) { 68             solutionMatrix[0][j] = j; 69         } 70         
         /** 71          * 上面两个操作后,求解矩阵变为 72         
          *      d e f 73         
           *   |0|1|2|3| 74          
         * a |1|x|x|x| 75         
          * b |2|x|x|x| 76         
          * c |3|x|x|x| 77          
          * 接下来就是填充剩余表格 78          
         */ 79         for (i = 1; i < m + 1; i++) {   //i = 1,j = 1, 2, 3,以行开始填充 80             s1i = s1.charAt(i - 1); 81             
         for (j = 1; j < n + 1; j++) { 82                 s2j = s2.charAt(j - 1); 83                 int flag = (s1i == s2j) ? 0 : 1;    
         //根据公式,如果s1[i] = s2[j],则d[i,j]=d[i-1,j-1],如果s1[i] ≠ s2[j],则其中一个公式为d[i,j]=d[i-1,j-1]+1 84                 
         solutionMatrix[i][j] = min(solutionMatrix[i][j-1] + 1, solutionMatrix[i-1][j] + 1, solutionMatrix[i-1][j-1] + flag); 85             
         } 86         } 87         return solutionMatrix[m][n]; 88     } 89  90     /** 91      * 根据公式求解编辑距离 92      
         * @param insert s1插入操作 93      * @param delete s1删除操作 94      * @param edit s1修改操作 95      * @return 编辑距离 96      
         */ 97     private static int min(int insert, int delete, int edit) { 98         int tmp = insert < delete ? insert : delete; 99         
         return tmp < edit ? tmp : edit;100     }101 }
Copy after login

## Python3

 1 &#39;&#39;&#39; 2     动态规划——字符串的编辑距离 3     s1 = "abc", s2 = "def" 4     
 计算公式: 5              | 0                                          
  i = 0, j = 0 6              | j                                           
  i = 0, j > 0 7     d[i,j] = | i                                          
   i > 0, j = 0 8              | min(d[i,j-1]+1, d[i-1,j]+1, d[i-1,j-1])    s1(i) = s2(j) 9              
   | min(d[i,j-1]+1, d[i-1,j]+1, d[i-1,j-1]+1)  s1(i) ≠ s2(j)10     
   定义二维数组[4][4]:11         d e f            d e f12     |x|x|x|x|        
   |0|1|2|3|13     a |x|x|x|x|  =>  a |1|1|2|3|  => 编辑距离d = [4][4] = 314     b |x|x|x|x|      
   b |2|2|2|3|15     c |x|x|x|x|      c |3|3|3|3|16 '''17 def levenshtein(s1, s2):18     i = 0  
    #s1字符串中的字符下标19     j = 0   #s2字符串中的字符下标20     s1i = ""    #s1字符串第i个字符21     
    s2j = ""    #s2字符串第j个字符22     m = len(s1) #s1字符串长度23     n = len(s2) #s2字符串长度24    
     if m == 0:25         return n    #s1字符串长度为0,此时的编辑距离就是s2字符串长度26     if n == 0:27        
      return m    #s2字符串长度为0,此时的编辑距离就是s1字符串长度28     
      solutionMatrix = [[0 for col in range(n + 1)] for row in range(m + 1)]  #长为m+1,宽为n+1的矩阵29     '''30       
      d e f31           |0|x|x|x|32         a |1|x|x|x|33         b |2|x|x|x|34        
       c |3|x|x|x|35     '''36     for i in range(m + 1):37         solutionMatrix[i][0] = i38     '''39       
      d e f40           |0|1|2|3|41         a |x|x|x|x|42         b |x|x|x|x|43         
       c |x|x|x|x|44         45     '''46     for j in range(n + 1):47         solutionMatrix[0][j] = j48     '''49         
       上面两个操作后,求解矩阵变为50              d e f51           |0|1|2|3|52         a |1|x|x|x|53        
        b |2|x|x|x|54         c |3|x|x|x|55         接下来就是填充剩余表格56     '''57     for x in range(1, m + 1):58         
        s1i = s1[x - 1]59         for y in range(1, n + 1):60             s2j = s2[y - 1]61             flag = 0 if s1i == s2j  else 162             
        solutionMatrix[x][y] = min(solutionMatrix[x][y-1] + 1, solutionMatrix[x-1][y] + 1, solutionMatrix[x-1][y-1] + flag)63 64     
        return solutionMatrix[m][n]65 66 def min(insert, delete, edit):67     tmp = insert if insert < delete else delete68     
        return tmp if tmp < edit else edit69 70 s1 = "abc"71 s2 = "def"72 distance = levenshtein(s1, s2)73 print(distance)
Copy after login

The above is the detailed content of Detailed explanation of edit distance examples of strings. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template