This article mainly introduces the cosine similarity calculation algorithm of PHP data analysis engine, and analyzes the operation steps and related implementation techniques of PHP calculation of cosine similarity in the form of specific examples. Friends who need it can refer to it
The example in this article describes the cosine similarity algorithm calculated by the PHP data analysis engine. Share it with everyone for your reference, the details are as follows:
For relevant introduction to cosine similarity, please refer to Baidu Encyclopedia: Cosine Similarity
<?php /** * 数据分析引擎 * 分析向量的元素 必须和基准向量的元素一致,取最大个数,分析向量不足元素以0填补。 * 求出分析向量与基准向量的余弦值 * @author yu.guo@okhqb.com */ /** * 获得向量的模 * @param unknown_type $array 传入分析数据的基准点的N维向量。|eg:array(1,1,1,1,1); */ function getMarkMod($arrParam){ $strModDouble = 0; foreach($arrParam as $val){ $strModDouble += $val * $val; } $strMod = sqrt($strModDouble); //是否需要保留小数点后几位 return $strMod; } /** * 获取标杆的元素个数 * @param unknown_type $arrParam * @return number */ function getMarkLenth($arrParam){ $intLenth = count($arrParam); return $intLenth; } /** * 对传入数组进行索引分配,基准点的索引必须为k,求夹角的向量索引必须为 'j'. * @param unknown_type $arrParam * @param unknown_type $index * @ruturn $arrBack */ function handIndex($arrParam, $index = 'k'){ foreach($arrParam as $key => $val){ $in = $index.$key; $arrBack[$in] = $val; } return $arrBack; } /** * * @param unknown_type $arrMark标杆向量数组(索引被处理过) * @param unknown_type $arrAnaly 分析向量数组 (索引被处理过) |array('j0'=>1,'j1'=>2....) * @param unknown_type $strMarkMod标杆向量的模 * @param unknown_type $intLenth 向量的长度 */ function getCosine($arrMark, $arrAnaly, $strMarkMod ,$intLenth){ $strVector = 0; $strCosine = 0; for($i = 0; $i < $intLenth; $i++){ $strMarkVal = $arrMark['k'.$i]; $strAnalyVal = $arrAnaly['j'.$i]; $strVector += $strMarkVal * $strAnalyVal; } $arrAnalyMod = getMarkMod($arrAnaly); //求分析向量的模 $strFenzi = $strVector; $strFenMu = $arrAnalyMod * $strMarkMod; $strCosine = $strFenzi / $strFenMu; if(0 !== (int)$strFenMu){ $strCosine = $strFenzi / $strFenMu; } return $strCosine; } ?>
The above is the detailed content of PHP calculation cosine similarity algorithm. For more information, please follow other related articles on the PHP Chinese website!