How to install and configure tesseract-ocr 4.00 under Windows?

零下一度
Release: 2023-03-10 17:18:02
Original
4874 people have browsed it

Recently I have to do text recognition, and I am not allowed to directly use other people’s interfaces, so I can only try to use open source libraries. tesseract-ocr is an open source text recognition project from HP. It can quickly build an image and text recognition system and help us develop an OCR system that can recognize images. Because I develop in Windows environment, I must install the system in Windows environment.

Step 1: Download the installation package

According to this, I found the unofficial installation package. It seems that I only saw the 64-bit installation package http://digi.bib.uni-mannheim .de/tesseract/tesseract-ocr-setup-4.00.00dev.exe, you can install it directly after downloading, but remember your installation directory, we will configure the environment variables later.

If you are not doing English image and text recognition, you need to download recognition packages in other languages.

Simplified Chinese character recognition package:

Traditional Chinese character recognition package:

Step 2: Install

Directly execute the downloaded tesseract -ocr-setup-4.00.00dev.exe, next step, next step to install.

Step 3: Configure environment variables

Note: My system is win7, other systems should be similar, just like configuring java variables

Copy your installation address, I is installed in C:\Program Files (x86)\Tesseract-OCR, the interface is as follows:

Copy the installation path "C:\Program Files (x86)\Tesseract- OCR", enter "Control Panel\System and Security\System", click

"System Protection"

to enter the following interface:

Click on the environment variable to enter the following configuration interface:

Change the installation path just now "C:\Program Files (x86)\ "Tesseract-OCR" is added to the PATH and Path underlined in red. Note that when adding, use ";" to separate it from the previous variables at the beginning and end with ";". The following is a sample of my configuration information:

C:\Users\Administrator\AppData\Roaming\Composer\vendor\bin;C:\Users\Administrator\AppData\Roaming\npm;C:\ Program Files (x86)\Tesseract-OCR;

After configuring, click Save.

Open the command terminal, enter: tesseract -v, you can see the version information

If an error occurs, it is probably an environment variable Not configured properly.

At this point, even if we have completed the installation, our system still cannot recognize Chinese. We need to download the simplified Chinese and traditional Chinese language packs (the addresses are given above). After downloading, put Just go to the tessconfigs directory of the installation directory.

Additional: Because there are no global variables configured, data conversion cannot be performed across disks. Here we add a configuration information to the environment variable

System variables—->New:

Add a TESSDATA_PREFIX variable name, the variable value is still my installation path C:\Program Files (x86)\Tesseract-OCR;

The above is the detailed content of How to install and configure tesseract-ocr 4.00 under Windows?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template