Home Backend Development Python Tutorial How to split PDF documents using PyPDF2 module in Python

How to split PDF documents using PyPDF2 module in Python

May 09, 2023 pm 03:34 PM
python pdf pypdf2

Install PyPDF2 module

# This module is strictly case-sensitive, y is lowercase, and the rest is uppercase

pip3 install PyPDF2
Copy after login

How to split PDF documents using PyPDF2 module in Python

After the installation is completed, create a folder specifically to store this project on the local hard disk. The storage path here is F:\Python\PyPDF2. There is a Python folder on the F drive, and I created it in it. A folder named after this module to store it separately and distinguish it from other projects.

Create files and prepare PDF documents

How to split PDF documents using PyPDF2 module in Python

Looking for a larger PDF document for practice, I downloaded it from the Django official website This document is large enough, with more than 1,900 pages, which is definitely enough for practicing. If necessary, go to the official website to download, or directly reply 'pdf' on my official account to get the download link, and then create a PDFCF.py project file .

Start writing

The program starts with two lines and writes the two sentences above and below. The first sentence means to specify the running program of this file. The second sentence This sentence is a description of this file. The function of this cannot be seen yet, but if you know how to quickly execute programs in batches, you will know its function. I will not go into details here.

#! python# PDFCF.py - pdf文件拆分程序
Copy after login

The idea of ​​document splitting

It is not fixed how many parts it is split into, but it is fixed how many pages each part consists of , and then dynamically calculate the number of splits. Once you have the idea of ​​splitting, the next step is to list the calculation formula.

拆分的份数= 文档总页数 / 拆份每个pdf组成的页数
Copy after login

For example:

If we want to split a pdf document with a total of 35 pages, it will be composed of 10 pages each For a new document, the calculation formula for how many parts it can be split into is as follows:

3.5 = 35 / 10
Copy after login

At this time, everyone pays attention. If the remainder is 0.5, what does it mean? Using this example, it means that there are 5 pages left after splitting into 3 parts. In this case, no matter what the remainder is, you have to move forward by 1 to complete the entire split. The result of this document split is that the first 3 documents Each document consists of 10 pages, and the fourth document consists of the last 5 pages. If it is divisible, the result is directly the number of split copies.

Python split calculation formula:

if 35 % 10:   # 判断是否有余数  35 // 10 + 1   # 取余数整数部分加1else:  0         # 能整除则直接返回0  # 将这个循环写到一行4 = 35 // 10 + 1 if 35 % 10 else 0
Copy after login

How to split it specifically?

Let’s take this 35-page document split as an example:

Loop through each page of data for num in range(35), get the data of each page, and then specify the split page range to split:

  1. The first document starts from 0- -10, excluding 10

  2. The second document is from 10--20, excluding 20

  3. The third document is from 20 - 30, not including 30

  4. The fourth document is from 30--35, not including 35

We found the pattern, each time we traverse the The rule of a number is the number of pages in a document, which can be obtained by multiplying the number it belongs to. We found that there is no pattern in the second number. In fact, there is a pattern if we observe carefully. If we sort the number of splits, this example is 1--4. The second number is the current number of splits multiplied by each The number of pages the document consists of (the number of pages is fixed at 10).

But when we traverse for the first time, we start from 0, which makes num unusable. Then we modify it and start traversing from 1, range(1,35), traverse from the beginning, based on the range is not Contains the last characteristic of itself, so that one page of documents will be missing after traversing, then we add 1 to it and become

  1. for num in range(1,35 1 )

  2. The first document starts from 10*(1-1)--10*1, excluding 10

  3. The two documents are from 10*(2-1)--10*2, not including 20

  4. The third document is from 10*(3-1)-10*3, not The fourth document containing 30

  5. ## is from 10(4-1)--35

The specific traversal code is as follows:

for num in range(1,35+1):  pass  for i in range(10 * (num-1), 10 * num if num != 4 else 35):    pass
Copy after login

Note: When traversing to num = 4 (the last document sort number), just return the total number of pages 35, and the traversal ends here. . Why is the total number of pages here 35 instead of 35 1? This is because we are traversing from 0 this time, and the page number starts from 0, so there is no need to add 1.

Complete splitting procedure:

import PyPDF2
Copy after login

Note: I personally feel that the splitting idea above is a bit convoluted. If you are interested in If you have a thorough understanding of the concepts of edge trimming and step size in Python lists, I don’t think it needs to be so complicated. You only need to generate a large list of the total page numbers, and then split the list into multiple small lists using the slicing method, and then split each list. The divided pdf page number range is the first number of each small list - the last number 1. I also posted the code I implemented using the list method for your reference.

Split list method to split PDF:

#! python
Copy after login

How to use?

How to split PDF documents using PyPDF2 module in Python

Hold down the Shift key inside the project folder, right-click the mouse, choose to open the command window here, enter PDFCF.py, and press Enter to change it according to your needs The value of n.

How to split PDF documents using PyPDF2 module in Python

The above is the detailed content of How to split PDF documents using PyPDF2 module in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Can the Python interpreter be deleted in Linux system? Can the Python interpreter be deleted in Linux system? Apr 02, 2025 am 07:00 AM

Regarding the problem of removing the Python interpreter that comes with Linux systems, many Linux distributions will preinstall the Python interpreter when installed, and it does not use the package manager...

How to solve the problem of Pylance type detection of custom decorators in Python? How to solve the problem of Pylance type detection of custom decorators in Python? Apr 02, 2025 am 06:42 AM

Pylance type detection problem solution when using custom decorator In Python programming, decorator is a powerful tool that can be used to add rows...

How to solve permission issues when using python --version command in Linux terminal? How to solve permission issues when using python --version command in Linux terminal? Apr 02, 2025 am 06:36 AM

Using python in Linux terminal...

Python 3.6 loading pickle file error ModuleNotFoundError: What should I do if I load pickle file '__builtin__'? Python 3.6 loading pickle file error ModuleNotFoundError: What should I do if I load pickle file '__builtin__'? Apr 02, 2025 am 06:27 AM

Loading pickle file in Python 3.6 environment error: ModuleNotFoundError:Nomodulenamed...

Do FastAPI and aiohttp share the same global event loop? Do FastAPI and aiohttp share the same global event loop? Apr 02, 2025 am 06:12 AM

Compatibility issues between Python asynchronous libraries In Python, asynchronous programming has become the process of high concurrency and I/O...

What should I do if the '__builtin__' module is not found when loading the Pickle file in Python 3.6? What should I do if the '__builtin__' module is not found when loading the Pickle file in Python 3.6? Apr 02, 2025 am 07:12 AM

Error loading Pickle file in Python 3.6 environment: ModuleNotFoundError:Nomodulenamed...

How to ensure that the child process also terminates after killing the parent process via signal in Python? How to ensure that the child process also terminates after killing the parent process via signal in Python? Apr 02, 2025 am 06:39 AM

The problem and solution of the child process continuing to run when using signals to kill the parent process. In Python programming, after killing the parent process through signals, the child process still...

See all articles