Home > Java > javaTutorial > body text

Java captures the maximum similarity of text

(*-*)浩
Release: 2019-10-30 15:28:35
forward
3186 people have browsed it

Java captures the maximum similarity of text

Maximum similarity code for java crawling text:

public static void main(String[] args) {
		//要比较的两个字符串
		String str1 = "汗1滴禾下土";
		String str2 = "汗滴禾下土";
		levenshtein(str1,str2);
		
		 str1 = "汗滴禾下土";
		 str2 = "汗滴禾下土";
		levenshtein(str1,str2);
		
		str1 = "锄禾日当午";
		 str2 = "汗滴禾下土";
		levenshtein(str1,str2);
		
		str1 = "我觉得锄禾日当午";
		 str2 = "锄禾日是sag";
		levenshtein(str1,str2);
		
		str1 = "我最帅asdasd";
		 str2 = "最帅asdasdqeqwe";
		levenshtein(str1,str2);
	}
 
	/**
	 *   DNA分析   拼字检查   语音辨识   抄袭侦测
	 * 
	 * @createTime 2012-1-12
	 */
	public static void levenshtein(String str1,String str2) {
		//计算两个字符串的长度。
		int len1 = str1.length();
		int len2 = str2.length();
		//建立上面说的数组,比字符长度大一个空间
		int[][] dif = new int[len1 + 1][len2 + 1];
		//赋初值,步骤B。
		for (int a = 0; a <= len1; a++) {
			dif[a][0] = a;
		}
		for (int a = 0; a <= len2; a++) {
			dif[0][a] = a;
		} 
		//计算两个字符是否一样,计算左上的值
		int temp;
		for (int i = 1; i <= len1; i++) {
			for (int j = 1; j <= len2; j++) {
				if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
					temp = 0;
				} else {
					temp = 1;
				}
				//取三个值中最小的
				dif[i][j] = min(dif[i - 1][j - 1] + temp, dif[i][j - 1] + 1,
						dif[i - 1][j] + 1);
			}
		}
		/*System.out.println("字符串\""+str1+"\"与\""+str2+"\"的比较");
		//取数组右下角的值,同样不同位置代表不同字符串的比较
		System.out.println("字符串\""+str1+"\"的长度["+str1.length()+"]与\""+str2+"\"的长度["+str2.length()+"]");
		System.out.println("差异步骤:"+dif[len1][len2] +"/" +Math.max(str1.length(), str2.length()));
		
		//计算相似度
		float similarity =1 - (float) dif[len1][len2] / Math.max(str1.length(), str2.length());
		System.out.println("------------------------"+(float)1/6);
		System.out.println("使用方法得到的相似度是:"+similarity);*/
		float similarity =1 - (float) dif[len1][len2] / Math.max(str1.length(), str2.length());
		System.out.println("字符串【"+str1+"】与【"+str2+"】的相似度是:"+similarity);
		System.out.println();
	}
 
	//得到最小值
	private static int min(int... is) {
		int min = Integer.MAX_VALUE;
		for (int i : is) {
			if (min > i) {
				min = i;
			}
		}
		return min;
	}
Copy after login

Output result:

字符串【汗1滴禾下土】与【汗滴禾下土】的相似度是:0.8333333
 
字符串【汗滴禾下土】与【汗滴禾下土】的相似度是:1.0
 
字符串【锄禾日当午】与【汗滴禾下土】的相似度是:0.0
 
字符串【我觉得锄禾日当午】与【锄禾日是sag】的相似度是:0.125
 
字符串【我最帅asdasd】与【最帅asdasdqeqwe】的相似度是:0.53846157
Copy after login

The above is the detailed content of Java captures the maximum similarity of text. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:csdn.net
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template