Table of Contents
1. Requirements description" >1. Requirements description
2. Logical sorting" >2. Logical sorting
##3. Overall implementation steps" >##3. Overall implementation steps
三、代码实现" >三、代码实现
Home Backend Development Python Tutorial Document batch translation tool written in Python, the effect is better than paid software?

Document batch translation tool written in Python, the effect is better than paid software?

Aug 09, 2023 pm 05:37 PM
python translate


##This article will share with you a practical Python office automation script

"Use Python to batch translate English Word documents and preserve the format", the final effect is even better than some paid software! Let’s take a look at the specific work content first.

1. Requirements description

I have a large number of foreign language documents on hand (this case takes

5 as an example, and they are named test1 .docx test2.docx and so on), one of which is as follows: Document batch translation tool written in Python, the effect is better than paid software?

Basic requirements:"Batch these documents All the contents are translated into Chinese and transferred to a new file", the effect is as follows: Document batch translation tool written in Python, the effect is better than paid software?

Advanced requirements: While the basic needs are met, the requirements『Keep the format of the original document』, the effect is as follows:Document batch translation tool written in Python, the effect is better than paid software?

2. Logical sorting

1 . Translation API

The core of this requirement is

Translation. The strategy is to use the translation API of the network. The Baidu Translation Open Platform is recommended here. It can be used if the number of concurrency is not considered. Standard version, free to use with no character limit!

Baidu Translation Open Platform: http://api.fanyi.baidu.com/api/trans/product/index

Before using Baidu’s universal translation API, you need to complete the following tasks:

  1. Use a Baidu account to log in to the Baidu Translation Open Platform (http://api.fanyi.baidu.com);
  2. Register as a developer and obtain APPID;
  3. Conduct developer certification (if you only need the standard version, you can skip it);
  4. Open the universal translation API service: activation link
  5. Refer to the technical documentation Write code with Demo
Document batch translation tool written in Python, the effect is better than paid software?

After completion, you can see the ID and key on the personal page. This is very important! The demo of the compiled universal translation API is given below. The output has been simply modified, and the code can be used! Document batch translation tool written in Python, the effect is better than paid software?Document batch translation tool written in Python, the effect is better than paid software?

You can see that the test content is accurately translated. Note that if you need to access the API multiple times, the free version has concurrency and time limits, you can use time The module sleeps for one second

2. Format modification

The difficulty with advanced requirements is to retain the format. To put it simplyoriginal What is the page format and paragraph format of the document, and what are the corresponding parts after translation.

Based on the above logical relationship, you only need to obtain the corresponding content of the original document and assign it to the newly translated document. (For the time being, it can only meet the unification of page settings and paragraph settings. For the format modification of specific words in a paragraph, ensuring accuracy requires natural language processing NLP, which is not covered in this article)

2.1 Page style

The page style only needs to include margins, direction, height, width, etc., as can be seen from the original document, the following is Narrow margins. But we don’t need to know how to set the four directions of narrow margins. We only need to present the variable transfer of the old and new documents in the code, as followsDocument batch translation tool written in Python, the effect is better than paid software?

2.2 Paragraph style

Paragraph styles include alignment, indentation, spacing, etc. In the original document, post-paragraph indentation is adopted, and the title is centered. These settings can be done well in variable passing. If the variable value not set in the original document is NoneDocument batch translation tool written in Python, the effect is better than paid software?

2.3 Text block style modification

for To adjust styles such as font size, bold, italics, and color, the strategy adopted is to create an empty list, traverse each text block of each paragraph of the original document, obtain the corresponding attributes and put them in their respective lists , and for the same paragraph For example, the option that contains the most text block attributes is assigned to the corresponding paragraph of the translated document (if all or most of the text in a paragraph is bold, then all text blocks in the corresponding paragraph after translation will be set to bold) Readers who are interested in NLP can try on their own how to highly restore the style modifications of certain specific words in English documents and reflect them in the translated documentsDocument batch translation tool written in Python, the effect is better than paid software?

The above code does not include font settings , because there is no need to pass English fonts to Chinese documents. The setting of Chinese fonts has been mentioned in previous articles. It is relatively complicated. See the code directly:

from docx.oxml.ns import qn

run.font.name = '微软雅黑'
r = run._element.rPr.rFonts
r.set(qn('w:eastAsia'), '微软雅黑')
Copy after login

##3. Overall implementation steps

Now each part of the operation has been completed. Considering that there are multiple documents that need to be translated in this example, the entire logic is as follows:

  1. 利用 glob 模块批处理框架可获取某个文件的绝对路径
  2. python-docx 完成 Word 文件实例化后对段落进行解析
  3. 解析出的段落文本交给百度通用翻译 API,解析返回的 Json 格式结果(上面的修改 demo 中已经完成了这一步)并重新写入新的文件
  4. 同个文件全部解析、翻译并写入新文件后保存文件

三、代码实现

导入需要的模块,除翻译 demo 中需要的库外还需要 glob 库批量获取文件、python-docx 读取文件、time 模块控制访问并发。为什么要 os 模块见下文:

import requests
import random
import json
from hashlib import md5
import time
from docx import Document
import glob
import os
Copy after login

对原 demo 的部分内容进行保留,涉及到 query 参数的代码需要移动到后面的循环中。保留的部分:Document batch translation tool written in Python, the effect is better than paid software?

效果如下Document batch translation tool written in Python, the effect is better than paid software?

获取到段落文本后,可以将段落文本赋值给 query 参数,调用 API demo 的后续代码。输出结果的同时用 add_paragraph 将结果写入新文档:Document batch translation tool written in Python, the effect is better than paid software?

最后保存成新文件,期望命名为 原文件名_translated 的形式,可用 os.path.basename 方法获取并经字符串拼接达到目的:

wordfile_new.save(path + r'\\' + os.path.basename(file)[:-5] + '_translated.docx')
Copy after login
Document batch translation tool written in Python, the effect is better than paid software?

单个文件操作完成后将读取和创建文件的代码块放到批处理框架内:Document batch translation tool written in Python, the effect is better than paid software?

完成了上面的内容后,基本需求就完成了。根据我们梳理的对样式的修改知识,再把样式调整的代码加进来就行了,最终完整代码如下:Document batch translation tool written in Python, the effect is better than paid software?

代码运行完毕后得到五个新的翻译后文件Document batch translation tool written in Python, the effect is better than paid software?

翻译效果如下,可以看到英文被翻译成中文,并且样式大部分保留!Document batch translation tool written in Python, the effect is better than paid software?

至此,所有文档都被成功翻译,当然这是机器翻译的,具体应用时还需要对关键部分进一步人工调整,不过整体来说还是一次成功的Python办公自动化尝试!

The above is the detailed content of Document batch translation tool written in Python, the effect is better than paid software?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP and Python: Code Examples and Comparison PHP and Python: Code Examples and Comparison Apr 15, 2025 am 12:07 AM

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

Python vs. JavaScript: Community, Libraries, and Resources Python vs. JavaScript: Community, Libraries, and Resources Apr 15, 2025 am 12:16 AM

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

Detailed explanation of docker principle Detailed explanation of docker principle Apr 14, 2025 pm 11:57 PM

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

How to run programs in terminal vscode How to run programs in terminal vscode Apr 15, 2025 pm 06:42 PM

In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.

Can visual studio code be used in python Can visual studio code be used in python Apr 15, 2025 pm 08:18 PM

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

Python: Automation, Scripting, and Task Management Python: Automation, Scripting, and Task Management Apr 16, 2025 am 12:14 AM

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

Can vs code run in Windows 8 Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

Is the vscode extension malicious? Is the vscode extension malicious? Apr 15, 2025 pm 07:57 PM

VS Code extensions pose malicious risks, such as hiding malicious code, exploiting vulnerabilities, and masturbating as legitimate extensions. Methods to identify malicious extensions include: checking publishers, reading comments, checking code, and installing with caution. Security measures also include: security awareness, good habits, regular updates and antivirus software.

See all articles