PHPAnalysis is a widely used Chinese word segmentation class. It uses reverse matching mode word segmentation, so it is compatible with a wider range of encodings. Its variables and common functions are now explained in detail as follows:
1. More important member variables
$resultType = 1 generated word segmentation result data type (1 is all, 2 is dictionary vocabulary and a single Chinese, Japanese, Korean, simplified and traditional character and English, 3 is dictionary vocabulary and English)
This variable is generally set using the SetResultType($rstype) method.
$notSplitLen = 5 Split the shortest sentence length
$toLower = false Convert all English words to lowercase
$differMax = false Use the maximum split mode to disambiguate bigram words
$unitWord = true Try to merge words (that is, new word recognition)
$differFreq = false Use popular word priority mode for disambiguation
2. List of main member functions
1. public function __construct($source_charset='utf- 8', $target_charset='utf-8', $load_all=true, $source='')
Function description: Constructor
Parameter list: (www.jbxue.com)
$source_charset source String encoding
$target_charset Directory string encoding
$load_all Whether to load the dictionary completely (this parameter has been invalidated)
$source source string
If the input and output are both utf-8, it is actually OK There is no need to use any parameters for initialization, but set the text to be operated through the SetSource method
2. public function SetSource( $source, $source_charset='utf-8', $target_charset='utf-8' )
Function description: Set source string
Parameter list:
$source source string
$source_charset source string encoding
$target_charset directory string encoding
Return value: bool
3 , public function StartAnalysis($optimize=true)
Function description: Start performing word segmentation operation
Parameter list:
$optimize Whether to try to optimize the results after word segmentation
Return value: void
A basic Word segmentation process:
////////////////////////////////////////
$pa = new PhpAnalysis();
$pa->SetSource('String that needs to be segmented');
//Set the segmentation attribute
$pa->resultType = 2;
$pa ->differMax = true;
$pa->StartAnalysis();
//Get the results you want
$pa->GetFinallyIndex();
///// /////////////////////////////////////
4. public function SetResultType( $rstype )
Function description: Setting the type of the returned result
is actually an operation on the member variable $resultType
The value of parameter $rstype is:
1 is all, 2 is dictionary vocabulary and a single Chinese, Japanese, Korean, simplified and traditional character and English, 3 Return value for dictionary words and English
: void
5. public function GetFinallyKeywords( $num = 10 )
Function description: Get the number of specified entries with the highest frequency (usually used to extract document keywords)
Parameter list:
$num = 10 Return number of entries
Return value: Keyword list separated by ","
6. public function GetFinallyResult($spword=' ')
Function description: Get the final word segmentation result
Parameter list:
$spword separator between entries
Return value: string
7. public function GetSimpleResult()
Function description: get Rough segmentation result
Return value: array
(Script Academy www.jbxue.com)
8. public function GetSimpleResultAll()
Function description: Get the rough segmentation result containing attribute information
Attributes (1 Chinese words and sentences, 2 ANSI vocabulary (including Full-width), 3 ANSI punctuation marks (including full-width), 4 numbers (including full-width), 5 Chinese punctuation or unrecognizable characters)
Return value: array
9. public function GetFinallyIndex()
Function description: Get hash index array
Return value: array('word'=>count,...) Sort by frequency of occurrence
10. public function MakeDict($source_file, $target_file='')
function Description: Compile the text file dictionary into a dictionary
Parameter list:
$source_file Source text file
$target_file Target file (if not specified, it is the current dictionary)
Return value: void
11. public function ExportDict($targetfile)
Function description: Export all entries of the current dictionary as text files
Parameter list:
$targetfile target file
Return value: void