The goals of this article are twofold: 1. Learn to use the 11 major Java open source Chinese word segmenters 2. Comparatively analyze the word segmentation effects of the 11 major Java open source Chinese word segmenters. This article gives the usage methods and word segmentation of the 11 major Java open source Chinese word segmenters. The results are compared with the codes. As for which one has better results, the user should judge it by themselves based on their own application scenarios. 11 major Java open source Chinese word segmenters. Different word segmenters have different usages and different defined interfaces. Let’s first define a unified interface: /*** Obtain all word segmentation results of the text and compare the results of different word segmenters * @author Yang Shangchuan */ public interface WordSegmenter { /** * Get all word segmentation results of the text  
Introduction: The goals of this article are twofold: 1. Learn to use the 11 major Java open source Chinese word segmenters 2. Comparatively analyze the word segmentation effects of the 11 major Java open source Chinese word segmenters This article gives the 11 major Java open source Chinese word segmenters How to use Java open source Chinese word segmentation and the word segmentation result comparison code. As for which one is better, the user must judge it based on their own application scenarios. 11 major Java open source Chinese word segmenters. Different word segmenters have different usages and different defined interfaces. Let’s first define a unified interface: /** * Get all the word segmentation results of the text and compare the results of different word segmenters * @ author Yang Shangchuan..
2. Write a simple Chinese word segmenter in Python
Introduction: After unzipping, take out the following files: Training data: icwb2-data/training/pku_ training.utf8 Test data: icwb2-data/testing/pku_ test.utf8 Correct word segmentation result: icw. ..
##3. solr4.4.0 integrates carrot2 to support Chinese and how to add your own Chinese word segmenter
Introduction: By default, carrot2 supports Chinese, but a parameter is required to specify carrot.lang= CHINESE_SIMPLIFIED. For the languages supported by carrot2, please refer to http://doc.carrot2.org/#div.attribute.lingo.MultilingualClustering.defaultLanguage. But by default, The word segmentation class used by carrot2 is org.apache.luc
Introduction : Robbe is a high-performance PHP Chinese word segmentation extension built on the Friso Chinese word segmenter. It also supports segmentation of UTF-8/GBK encoding. Robbe-1.6.0: 1. Change the interface to apply to Friso-1.6.0. 2. Modified the UTF-8 test program, added multiple configuration test options, and added a GBK test program. 3. Changed rb_split, you can customize the return
The above is the detailed content of 10 recommended articles about Chinese word segmenters. For more information, please follow other related articles on the PHP Chinese website!