In C programming, string matching problems are very common. Simply put, the string matching problem is the process of finding a specific pattern string in a text string. In practical applications, string matching algorithms are mainly used in fields such as text search, image recognition, and natural language processing. This article will focus on commonly used string matching algorithms and their implementation in C.
The naive string matching algorithm is also called a brute force search matching algorithm. The idea is to successively try to match the pattern string P at each position of the text string T until a matching position is found or P does not exist in the entire T. The time complexity of this algorithm is high, O(n*m), where n and m are the lengths of the text string T and pattern string P respectively.
C code implementation is as follows:
void naive_match(string T, string P) { int n = T.length(); int m = P.length(); for(int i = 0; i <= n-m; i++) { int j; for(j = 0; j < m; j++) { if(T[i+j] != P[j]) break; } if(j == m) { cout << "Pattern occurs with shift " << i << endl; } } }
KMP string matching algorithm is a classic string matching algorithm. Its core idea is to avoid repeated matching of already matched characters in the text string T by matching the longest common prefix and suffix of the pattern string P. The time complexity of this algorithm is O(n), where n is the length of the text string.
C code implementation is as follows:
void get_next(string P, vector<int>& next) { int m = P.length(); next[0] = -1; int i = 0; int j = -1; while(i < m) { if(j == -1 || P[i] == P[j]) { i++; j++; next[i] = j; } else { j = next[j]; } } } void kmp_match(string T, string P) { int n = T.length(); int m = P.length(); vector<int> next(m+1); get_next(P, next); int i = 0; int j = 0; while(i < n) { if(j == -1 || T[i] == P[j]) { i++; j++; } else { j = next[j]; } if(j == m) { cout << "Pattern occurs with shift " << i-m << endl; j = next[j]; } } }
BM algorithm is a string based on bad characters and good suffix rules matching algorithm. Its core idea is to skip the matched characters by matching the last character of the pattern string P and preprocessing the unmatched characters in the text string T. The time complexity of this algorithm is O(n).
The C code is implemented as follows:
const int MAXCHAR = 256; void bm_match(string T, string P) { int n = T.length(); int m = P.length(); vector<int> badchar(MAXCHAR, -1); for(int i = 0; i < m; i++) { badchar[int(P[i])] = i; } vector<int> suffix(m+1); vector<bool> prefix(m+1); get_suffix_prefix(P, suffix, prefix); int i = 0; while(i <= n-m) { int j = m-1; while(j >= 0 && P[j] == T[i+j]) j--; if(j < 0) { cout << "Pattern occurs with shift " << i << endl; i += (i+m < n) ? m-badchar[int(T[i+m])] : 1; } else { i += max(suffix[j+1], j-badchar[int(T[i+j])]); } } }
This article mainly introduces the commonly used string matching algorithms in C and their implementation. Although the naive string matching algorithm is simple, its time complexity is high, while the KMP and BM algorithms can find the matching position more quickly. Among them, the KMP algorithm is suitable for short pattern strings, and the BM algorithm is suitable for long pattern strings. In actual applications, we need to choose the appropriate algorithm for string matching according to different situations.
The above is the detailed content of String matching algorithm and its implementation in C++. For more information, please follow other related articles on the PHP Chinese website!