Table of Contents
截图
文字识别
访问剪切板
总结
Home Backend Development Python Tutorial Python implements a simple picture text recognition script

Python implements a simple picture text recognition script

Apr 04, 2018 pm 01:59 PM
python picture letter identify

我们都知道,部分电子版的书籍是以扫描图片的形式展现的,在阅读过程中无法选取文字。对于平时有记录习惯的人来说,无法复制黏贴真的很不爽! 为了解决这个问题,需要这样一个脚本,他有下面这些功能:

1、能够实现自由截图
2、能够识别含有文字的截图
3、将识别出的文字输出到剪切板

大致上需要的东西非常明确,那么,一个一个的来~

截图

截图作为一项非常实用的功能,自然是有各种各样的实现,在这里考虑使用python去完成这个任务,那么自然是先google一下,网上一搜,果然资料多到爆炸~
不出所料,python对截图功能做了很好的基础支持~(本文基于windows平台下的python2实现,python3安装某些库真滴烦人)
(1)全屏截图
那么先从简单的做起(截图稍微麻烦一点,其他部分都超级简单 = =),首先实现python的“全屏截图”
代码入下:

from PIL import ImageGrab

im = ImageGrab.grab()  # 截取全屏
im.save(file)
Copy after login

简单的三行代码搞定~(赞美一下前人的伟大_(:з)∠)_)
其中的path表示文件截图文件的完整存放路径
其中稍微要注意一下的是,安装库的时候,使用

pip install pillow(而不是PIL)
Copy after login
Copy after login

否则会显示找不到匹配的模块~
(PS:这里其实有个问题,上述代码运行完成后,并没有截取全屏,最后生成的图片仅仅为部分图片,查看网上攻略后并没有找到比较好的解决方法,残念…)

(2)自由截图
好了,全屏截图完成,那么我们其实只想要截取需要的部分就行了,那么怎么去完成这个功能呢~
看了一部分网上的做法,最常见的就是监听鼠标动作实现选择截图区域。其中,其中使用较多的是tkinter,以及pyHook(tkinter实现那个稍微复杂一点),我个人倾向于后者,因为实现起来非常简单,哈哈~
那么先上部分代码

# coding:utf-8
import win32api
import os
from PIL 
import ImageGrab, Imageimport pyHook
import pythoncom

# 创建一个坐标列表(x1,y1,x2,y2)
coordinate = [1, 1, 1, 1]

# 监听键盘事件
def on_mouse_event(event):
    file_path = 'xx//xx//read.jpg'
    # 监听鼠标事件
    if event.MessageName == 'mouse left down':
        coordinate[0:2] = event.Position    
    elif event.MessageName == 'mouse left up':
        coordinate[2:4] = event.Position
        win32api.PostQuitMessage()  # 退出监听循环
        # 截取坐标图片
        pic = ImageGrab.grab(coordinate)
        pic.save(file_path)
Copy after login

唯一比较麻烦的就是各种库的安装,重点点名pywin32这个库 = =,真姬儿麻烦~
那么在这里附上链接,防止安装过程中可能出现的问题:
解决安装pywin32后仍然提示找不到模块的问题

文字识别

搞定了截图功能以后,剩下的工作就比较简单了。python的pytesseract为文字识别提供了很好的支持。整个实现只需要一行关键代码即可:

from PIL import Image
import pytesseract
text=pytesseract.image_to_string(Image.open(file_path),lang='chi_sim')
print(text)
Copy after login

使用这个库之前,必须安装识别引擎tesseract-ocr,下载链接如下(下载完成后为exe安装包):
tesseract-ocr识别引擎下载
这里附上安装及配置环境变量的教程(摘自百度百科):
图片文字OCR识别-tesseract-ocr4.00.00安装使用
最后,在pytesseract库文件中进行配置,找到F:\XX\XX\XX\你的python安装路径\Lib\site-packages\pytesseract
找到该路径下的pytesseract.py文件,打开后找到一下这句代码:

tesseract_cmd = 'tesseract'
Copy after login
Copy after login

将字符串’tesseract’替换成你的tesseract-ocr的安装路径(e.g.’F:\Program_File\Tesseract-OCR\tesseract.exe’)

至此,文字识别引擎的全部配置就已经完成了。

访问剪切板

最后,将识别好的文字导入剪切板
两步到位:
(1)pip安装pyperclip库
(2)同样一行代码:

pyperclip.copy(text)  # 将识别内容导入系统剪切板
Copy after login
Copy after login

大功告成~

总结

整个代码实现非常的简洁,总共也就几十行不到的代码,这也多亏了python强大的库支持。
然而比较遗憾的是,截图功能的实现很是简陋,使用tkinter可以实现出类似QQ截图的效果(代码也相对复杂一些)~
有了这个脚本之后,看扫描图片的PDF电子书就不需要打字记笔记了~吼吼吼~:)
最后附上完整的代码

# coding:utf-8
import inspect
import win32api
import os
from PIL import ImageGrab, Image
import pyHook  # 钩子~
import pythoncom
import pytesseract  # 图像识别文字包
import pyperclip

# 创建一个坐标列表
coordinate = [1, 1, 1, 1]

# 监听键盘事件
def on_mouse_event(event):
    # 获取当前文件路径
    file_ = inspect.getfile(inspect.currentframe())
    dir_path = os.path.abspath(os.path.dirname(file_))
    file_path = dir_path + '\\read.jpg'
    # 监听鼠标事件
    if event.MessageName == 'mouse left down':
        coordinate[0:2] = event.Position    
    elif event.MessageName == 'mouse left up':
        coordinate[2:4] = event.Position
        win32api.PostQuitMessage()  # 退出监听循环
        # 截取坐标图片
        pic = ImageGrab.grab(coordinate)
        pic.save(file_path)
        text = pytesseract.image_to_string(Image.open(file_path), lang='chi_sim')  # 识别并返回
        pyperclip.copy(text.replace(' ', ''))  # 将识别内容导入系统剪切板
   return True
   
   
    if __name__ == '__main__':
    hm = pyHook.HookManager()  # 创建一个钩子管理对象
    hm.MouseAll = on_mouse_event  # 监听所有鼠标事件
    hm.HookMouse()  # 设定鼠标钩子
    pythoncom.PumpMessages()  # 进入循环,程序一直监听
Copy after login

快毕业了,除了准备答辩之外,就是看看书,各种瞎晃~
那么,这两天在看书的时候遇到这么个问题:
首先,部分电子版的书籍是以扫描图片的形式展现的,在阅读过程中无法选取文字。对于平时有记录习惯的我来说,无法复制黏贴真的很不爽!
为了解决这个问题,我需要这样一个脚本,他有下面这些功能:

1、能够实现自由截图
2、能够识别含有文字的截图
3、将识别出的文字输出到剪切板

大致上需要的东西非常明确,那么,一个一个的来~

截图

截图作为一项非常实用的功能,自然是有各种各样的实现,在这里考虑使用python去完成这个任务,那么自然是先google一下,网上一搜,果然资料多到爆炸~
不出所料,python对截图功能做了很好的基础支持~(本文基于windows平台下的python2实现,python3安装某些库真滴烦人)
(1)全屏截图
那么先从简单的做起(截图稍微麻烦一点,其他部分都超级简单 = =),首先实现python的“全屏截图”
代码入下:

from PIL import ImageGrab

im = ImageGrab.grab()  # 截取全屏im.save(file)
Copy after login

简单的三行代码搞定~(赞美一下前人的伟大_(:з)∠)_)
其中的path表示文件截图文件的完整存放路径
其中稍微要注意一下的是,安装库的时候,使用

pip install pillow(而不是PIL)
Copy after login
Copy after login

否则会显示找不到匹配的模块~
(PS:这里其实有个问题,上述代码运行完成后,并没有截取全屏,最后生成的图片仅仅为部分图片,查看网上攻略后并没有找到比较好的解决方法,残念…)

(2)自由截图
好了,全屏截图完成,那么我们其实只想要截取需要的部分就行了,那么怎么去完成这个功能呢~
看了一部分网上的做法,最常见的就是监听鼠标动作实现选择截图区域。其中,其中使用较多的是tkinter,以及pyHook(tkinter实现那个稍微复杂一点),我个人倾向于后者,因为实现起来非常简单,哈哈~
那么先上部分代码

# coding:utf-8import win32apiimport osfrom PIL import ImageGrab, Imageimport pyHookimport pythoncom# 创建一个坐标列表(x1,y1,x2,y2)coordinate = [1, 1, 1, 1]# 监听键盘事件def on_mouse_event(event):
    file_path = 'xx//xx//read.jpg'
    # 监听鼠标事件
    if event.MessageName == 'mouse left down':
        coordinate[0:2] = event.Position    elif event.MessageName == 'mouse left up':
        coordinate[2:4] = event.Position
        win32api.PostQuitMessage()  # 退出监听循环
        # 截取坐标图片
        pic = ImageGrab.grab(coordinate)
        pic.save(file_path)
Copy after login

唯一比较麻烦的就是各种库的安装,重点点名pywin32这个库 = =,真姬儿麻烦~
那么在这里附上链接,防止安装过程中可能出现的问题:
解决安装pywin32后仍然提示找不到模块的问题

文字识别

搞定了截图功能以后,剩下的工作就比较简单了。python的pytesseract为文字识别提供了很好的支持。整个实现只需要一行关键代码即可:

from PIL import Imageimport pytesseract
text=pytesseract.image_to_string(Image.open(file_path),lang='chi_sim')
print(text)
Copy after login

使用这个库之前,必须安装识别引擎tesseract-ocr,下载链接如下(下载完成后为exe安装包):
tesseract-ocr识别引擎下载
这里附上安装及配置环境变量的教程(摘自百度百科):
图片文字OCR识别-tesseract-ocr4.00.00安装使用
最后,在pytesseract库文件中进行配置,找到F:\XX\XX\XX\你的python安装路径\Lib\site-packages\pytesseract
找到该路径下的pytesseract.py文件,打开后找到一下这句代码:

tesseract_cmd = 'tesseract'
Copy after login
Copy after login

将字符串’tesseract’替换成你的tesseract-ocr的安装路径(e.g.’F:\Program_File\Tesseract-OCR\tesseract.exe’)

至此,文字识别引擎的全部配置就已经完成了。

访问剪切板

最后,将识别好的文字导入剪切板
两步到位:
(1)pip安装pyperclip库
(2)同样一行代码:

pyperclip.copy(text)  # 将识别内容导入系统剪切板
Copy after login
Copy after login

大功告成~

总结

整个代码实现非常的简洁,总共也就几十行不到的代码,这也多亏了python强大的库支持。
然而比较遗憾的是,截图功能的实现很是简陋,使用tkinter可以实现出类似QQ截图的效果(代码也相对复杂一些)~
有了这个脚本之后,看扫描图片的PDF电子书就不需要打字记笔记了~吼吼吼~:)
最后附上完整的代码

# coding:utf-8import inspectimport win32apiimport osfrom PIL import ImageGrab, Imageimport pyHook  # 钩子~import pythoncomimport pytesseract  # 图像识别文字包import pyperclip# 创建一个坐标列表coordinate = [1, 1, 1, 1]# 监听键盘事件def on_mouse_event(event):
    # 获取当前文件路径
    file_ = inspect.getfile(inspect.currentframe())
    dir_path = os.path.abspath(os.path.dirname(file_))
    file_path = dir_path + '\\read.jpg'
    # 监听鼠标事件
    if event.MessageName == 'mouse left down':
        coordinate[0:2] = event.Position    elif event.MessageName == 'mouse left up':
        coordinate[2:4] = event.Position
        win32api.PostQuitMessage()  # 退出监听循环
        # 截取坐标图片
        pic = ImageGrab.grab(coordinate)
        pic.save(file_path)
        text = pytesseract.image_to_string(Image.open(file_path), lang='chi_sim')  # 识别并返回
        pyperclip.copy(text.replace(' ', ''))  # 将识别内容导入系统剪切板
    return Trueif __name__ == '__main__':
    hm = pyHook.HookManager()  # 创建一个钩子管理对象
    hm.MouseAll = on_mouse_event  # 监听所有鼠标事件
    hm.HookMouse()  # 设定鼠标钩子
    pythoncom.PumpMessages()  # 进入循环,程序一直监听
Copy after login

The above is the detailed content of Python implements a simple picture text recognition script. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP and Python: Code Examples and Comparison PHP and Python: Code Examples and Comparison Apr 15, 2025 am 12:07 AM

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

Python vs. JavaScript: Community, Libraries, and Resources Python vs. JavaScript: Community, Libraries, and Resources Apr 15, 2025 am 12:16 AM

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

Detailed explanation of docker principle Detailed explanation of docker principle Apr 14, 2025 pm 11:57 PM

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

How to run programs in terminal vscode How to run programs in terminal vscode Apr 15, 2025 pm 06:42 PM

In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.

Python: Automation, Scripting, and Task Management Python: Automation, Scripting, and Task Management Apr 16, 2025 am 12:14 AM

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

What is vscode What is vscode for? What is vscode What is vscode for? Apr 15, 2025 pm 06:45 PM

VS Code is the full name Visual Studio Code, which is a free and open source cross-platform code editor and development environment developed by Microsoft. It supports a wide range of programming languages ​​and provides syntax highlighting, code automatic completion, code snippets and smart prompts to improve development efficiency. Through a rich extension ecosystem, users can add extensions to specific needs and languages, such as debuggers, code formatting tools, and Git integrations. VS Code also includes an intuitive debugger that helps quickly find and resolve bugs in your code.

Can vs code run in Windows 8 Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

Can visual studio code be used in python Can visual studio code be used in python Apr 15, 2025 pm 08:18 PM

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

See all articles