solr4.4.0 集成 carrot2 支持中文和添加自己的中文分词器的方法-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

solr4.4.0 集成 carrot2 支持中文和添加自己的中文分词器的方法

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 03:37 PM

Chinese support Add to integrated

默认 carrot2中是支持中文的，但是需要一个参数进行指定 carrot.lang= CHINESE_SIMPLIFIED carrot2支持的语言可以参考http://doc.carrot2.org/#div.attribute.lingo.MultilingualClustering.defaultLanguage 但是默认， carrot2使用的分词类是org.apache.luc

默认 carrot2中是支持中文的，但是需要一个参数进行指定

carrot.lang=CHINESE_SIMPLIFIED

carrot2支持的语言可以参考http://doc.carrot2.org/#div.attribute.lingo.MultilingualClustering.defaultLanguage

但是默认，carrot2使用的分词类是 org.apache.lucene.analysis.cn.smart.SentenceTokenizer，这是看 carrot源代码找到的源码如下(在org.apache.solr.handler.clustering.carrot2.LuceneCarrot2TokenizerFactory类中)

private ChineseTokenizer() throws Exception {
this.tempCharSequence = new MutableCharArray(new char[0]);

// As Smart Chinese is not available during compile time,
// we need to resort to reflection.
final Class> tokenizerClass = ReflectionUtils.classForName(
"org.apache.lucene.analysis.cn.smart.SentenceTokenizer", false);
this.sentenceTokenizer = (Tokenizer) tokenizerClass.getConstructor(
Reader.class).newInstance((Reader) null);
this.tokenFilterClass = ReflectionUtils.classForName(
"org.apache.lucene.analysis.cn.smart.WordTokenFilter", false);
}

如果，没有这个类，carrot2默认就会使用一个 ExtendedWhitespaceTokenizer 使用空格进行切词，所以如果要使用carrot2自己的中文切词，需要加入 lucene-analyzers-smartcn-4.4.0.jar

当然也可以使用自己的分词包，比如IK等等，把上述源码替换成相应的类即可。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7583

CakePHP Tutorial

1386

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

122

Related knowledge

How to set Chinese in Call of Duty: Warzone mobile game Mar 22, 2024 am 08:41 AM

Call of Duty Warzone is a newly launched mobile game. Many players are very curious about how to set the language of this game to Chinese. In fact, it is very simple. Players only need to download the Chinese language pack, and then You can modify it after using it. The detailed content can be learned in this Chinese setting method introduction. Let us take a look together. How to set the Chinese language for the mobile game Call of Duty: Warzone 1. First enter the game and click the settings icon in the upper right corner of the interface. 2. In the menu bar that appears, find the [Download] option and click it. 3. Select [SIMPLIFIEDCHINESE] (Simplified Chinese) on this page to download the Simplified Chinese installation package. 4. Return to the settings

Setting up Chinese with VSCode: The Complete Guide Mar 25, 2024 am 11:18 AM

VSCode Setup in Chinese: A Complete Guide In software development, Visual Studio Code (VSCode for short) is a commonly used integrated development environment. For developers who use Chinese, setting VSCode to the Chinese interface can improve work efficiency. This article will provide you with a complete guide, detailing how to set VSCode to a Chinese interface and providing specific code examples. Step 1: Download and install the language pack. After opening VSCode, click on the left

How to set Excel table to display Chinese? Excel switching Chinese operation tutorial Mar 14, 2024 pm 03:28 PM

Excel spreadsheet is one of the office software that many people are using now. Some users, because their computer is Win11 system, so the English interface is displayed. They want to switch to the Chinese interface, but they don’t know how to operate it. To solve this problem, this issue The editor is here to answer the questions for all users. Let’s take a look at the content shared in today’s software tutorial. Tutorial for switching Excel to Chinese: 1. Enter the software and click the "File" option on the left side of the toolbar at the top of the page. 2. Select "options" from the options given below. 3. After entering the new interface, click the “language” option on the left

How to display Chinese characters correctly in PHP Dompdf Mar 05, 2024 pm 01:03 PM

How to display Chinese characters correctly in PHPDompdf When using PHPDompdf to generate PDF files, it is a common challenge to encounter the problem of garbled Chinese characters. This is because the font library used by Dompdf by default does not contain Chinese character sets. In order to display Chinese characters correctly, we need to manually set the font of Dompdf and make sure to select a font that supports Chinese characters. Here are some specific steps and code examples to solve this problem: Step 1: Download the Chinese font file First, we need

An effective way to fix Chinese garbled characters in PHP Dompdf Mar 05, 2024 pm 04:45 PM

Title: An effective way to repair Chinese garbled characters in PHPDompdf. When using PHPDompdf to generate PDF documents, garbled Chinese characters are a common problem. This problem usually stems from the fact that Dompdf does not support Chinese character sets by default, resulting in Chinese content not being displayed correctly. In order to solve this problem, we need to take some effective ways to fix the Chinese garbled problem of PHPDompdf. 1. Use custom font files. An effective way to solve the problem of Chinese garbled characters in Dompdf is to use

How to add a TV to Mijia Mar 25, 2024 pm 05:00 PM

Many users are increasingly favoring the electronic ecosystem of Xiaomi smart home interconnection in modern life. After connecting to the Mijia APP, you can easily control the connected devices with your mobile phone. However, many users still don’t know how to add Mijia to their homes. app, then this tutorial guide will bring you the specific connection methods and steps, hoping to help everyone in need. 1. After downloading Mijia APP, create or log in to Xiaomi account. 2. Adding method: After the new device is powered on, bring the phone close to the device and turn on the Xiaomi TV. Under normal circumstances, a connection prompt will pop up. Select "OK" to enter the device connection process. If no prompt pops up, you can also add the device manually. The method is: after entering the smart home APP, click the 1st button on the lower left

Will wwe2k24 have Chinese? Mar 13, 2024 pm 04:40 PM

"WWE2K24" is a racing sports game created by Visual Concepts and was officially released on March 9, 2024. This game has been highly praised, and many players are eagerly interested in whether it will have a Chinese version. Unfortunately, so far, "WWE2K24" has not yet launched a Chinese language version. Will wwe2k24 be in Chinese? Answer: Chinese is not currently supported. The standard version of WWE2K24 in the Steam Chinese region is priced at 199 yuan, the deluxe version is 329 yuan, and the commemorative edition is 395 yuan. The game has relatively high configuration requirements, and there are certain standards in terms of processor, graphics card, or running memory. Official recommended configuration and minimum configuration introduction:

Tips for solving Chinese garbled characters when writing txt files with PHP Mar 27, 2024 pm 01:18 PM

Tips for solving Chinese garbled characters written by PHP into txt files. With the rapid development of the Internet, PHP, as a widely used programming language, is used by more and more developers. In PHP development, it is often necessary to read and write text files, including txt files that write Chinese content. However, due to encoding format problems, sometimes the written Chinese will appear garbled. This article will introduce some techniques to solve the problem of Chinese garbled characters written into txt files by PHP, and provide specific code examples. Problem analysis in PHP, text

See all articles