solr4.4.0 集成 carrot2 支持中文和添加自己的中文分词器的方法
默认 carrot2中是支持中文的,但是需要一个参数进行指定 carrot.lang= CHINESE_SIMPLIFIED carrot2支持的语言可以参考http://doc.carrot2.org/#div.attribute.lingo.MultilingualClustering.defaultLanguage 但是默认, carrot2使用的分词类是org.apache.luc
默认 carrot2中是支持中文的,但是需要一个参数进行指定
carrot.lang=CHINESE_SIMPLIFIED
carrot2支持的语言可以参考http://doc.carrot2.org/#div.attribute.lingo.MultilingualClustering.defaultLanguage
但是默认,carrot2使用的分词类是 org.apache.lucene.analysis.cn.smart.SentenceTokenizer,这是看 carrot源代码找到的源码如下(在org.apache.solr.handler.clustering.carrot2.LuceneCarrot2TokenizerFactory类中)
private ChineseTokenizer() throws Exception {
this.tempCharSequence = new MutableCharArray(new char[0]);
// As Smart Chinese is not available during compile time,
// we need to resort to reflection.
final Class> tokenizerClass = ReflectionUtils.classForName(
"org.apache.lucene.analysis.cn.smart.SentenceTokenizer", false);
this.sentenceTokenizer = (Tokenizer) tokenizerClass.getConstructor(
Reader.class).newInstance((Reader) null);
this.tokenFilterClass = ReflectionUtils.classForName(
"org.apache.lucene.analysis.cn.smart.WordTokenFilter", false);
}
如果,没有这个类,carrot2默认就会使用一个 ExtendedWhitespaceTokenizer 使用空格进行切词,所以如果要使用carrot2自己的中文切词,需要加入 lucene-analyzers-smartcn-4.4.0.jar
当然也可以使用自己的分词包,比如IK等等,把上述源码替换成相应的类即可。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Call of Duty Warzone is a newly launched mobile game. Many players are very curious about how to set the language of this game to Chinese. In fact, it is very simple. Players only need to download the Chinese language pack, and then You can modify it after using it. The detailed content can be learned in this Chinese setting method introduction. Let us take a look together. How to set the Chinese language for the mobile game Call of Duty: Warzone 1. First enter the game and click the settings icon in the upper right corner of the interface. 2. In the menu bar that appears, find the [Download] option and click it. 3. Select [SIMPLIFIEDCHINESE] (Simplified Chinese) on this page to download the Simplified Chinese installation package. 4. Return to the settings

VSCode Setup in Chinese: A Complete Guide In software development, Visual Studio Code (VSCode for short) is a commonly used integrated development environment. For developers who use Chinese, setting VSCode to the Chinese interface can improve work efficiency. This article will provide you with a complete guide, detailing how to set VSCode to a Chinese interface and providing specific code examples. Step 1: Download and install the language pack. After opening VSCode, click on the left

Excel spreadsheet is one of the office software that many people are using now. Some users, because their computer is Win11 system, so the English interface is displayed. They want to switch to the Chinese interface, but they don’t know how to operate it. To solve this problem, this issue The editor is here to answer the questions for all users. Let’s take a look at the content shared in today’s software tutorial. Tutorial for switching Excel to Chinese: 1. Enter the software and click the "File" option on the left side of the toolbar at the top of the page. 2. Select "options" from the options given below. 3. After entering the new interface, click the “language” option on the left

How to display Chinese characters correctly in PHPDompdf When using PHPDompdf to generate PDF files, it is a common challenge to encounter the problem of garbled Chinese characters. This is because the font library used by Dompdf by default does not contain Chinese character sets. In order to display Chinese characters correctly, we need to manually set the font of Dompdf and make sure to select a font that supports Chinese characters. Here are some specific steps and code examples to solve this problem: Step 1: Download the Chinese font file First, we need

Title: An effective way to repair Chinese garbled characters in PHPDompdf. When using PHPDompdf to generate PDF documents, garbled Chinese characters are a common problem. This problem usually stems from the fact that Dompdf does not support Chinese character sets by default, resulting in Chinese content not being displayed correctly. In order to solve this problem, we need to take some effective ways to fix the Chinese garbled problem of PHPDompdf. 1. Use custom font files. An effective way to solve the problem of Chinese garbled characters in Dompdf is to use

Many users are increasingly favoring the electronic ecosystem of Xiaomi smart home interconnection in modern life. After connecting to the Mijia APP, you can easily control the connected devices with your mobile phone. However, many users still don’t know how to add Mijia to their homes. app, then this tutorial guide will bring you the specific connection methods and steps, hoping to help everyone in need. 1. After downloading Mijia APP, create or log in to Xiaomi account. 2. Adding method: After the new device is powered on, bring the phone close to the device and turn on the Xiaomi TV. Under normal circumstances, a connection prompt will pop up. Select "OK" to enter the device connection process. If no prompt pops up, you can also add the device manually. The method is: after entering the smart home APP, click the 1st button on the lower left

"WWE2K24" is a racing sports game created by Visual Concepts and was officially released on March 9, 2024. This game has been highly praised, and many players are eagerly interested in whether it will have a Chinese version. Unfortunately, so far, "WWE2K24" has not yet launched a Chinese language version. Will wwe2k24 be in Chinese? Answer: Chinese is not currently supported. The standard version of WWE2K24 in the Steam Chinese region is priced at 199 yuan, the deluxe version is 329 yuan, and the commemorative edition is 395 yuan. The game has relatively high configuration requirements, and there are certain standards in terms of processor, graphics card, or running memory. Official recommended configuration and minimum configuration introduction:

Tips for solving Chinese garbled characters written by PHP into txt files. With the rapid development of the Internet, PHP, as a widely used programming language, is used by more and more developers. In PHP development, it is often necessary to read and write text files, including txt files that write Chinese content. However, due to encoding format problems, sometimes the written Chinese will appear garbled. This article will introduce some techniques to solve the problem of Chinese garbled characters written into txt files by PHP, and provide specific code examples. Problem analysis in PHP, text
