Home > Backend Development > Python Tutorial > How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?

How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?

Susan Sarandon
Release: 2024-10-22 07:53:30
Original
783 people have browsed it

How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?

Extracting Native Resolution Images from PDFs in Python without Resampling

Extracting images from PDFs with their native resolution and format while preserving the layout can be a challenge. However, Python's PyMuPDF module provides a straightforward solution.

Using PyMuPDF

PyMuPDF can output images as PNG files, ensuring high resolution and maintaining the original format (e.g., TIFF, JPEG). The following code demonstrates its usage:

<code class="python">import fitz
doc = fitz.open("file.pdf")
for i in range(len(doc)):
    for img in doc.getPageImageList(i):
        xref = img[0]
        pix = fitz.Pixmap(doc, xref)
        if pix.n < 5:       # GRAY or RGB
            pix.writePNG("p%s-%s.png" % (i, xref))
        else:               # CMYK
            pix1 = fitz.Pixmap(fitz.csRGB, pix)
            pix1.writePNG("p%s-%s.png" % (i, xref))
            pix1 = None
        pix = None</code>
Copy after login

Modified Version for fitz 1.19.6

For the latest version of fitz (1.19.6), the following modified code can be used:

<code class="python">import os
import fitz
from tqdm import tqdm

workdir = "your_folder"

for each_path in os.listdir(workdir):
    if ".pdf" in each_path:
        doc = fitz.Document((os.path.join(workdir, each_path)))

        for i in tqdm(range(len(doc)), desc="pages"):
            for img in tqdm(doc.get_page_images(i), desc="page_images"):
                xref = img[0]
                image = doc.extract_image(xref)
                pix = fitz.Pixmap(doc, xref)
                pix.save(os.path.join(workdir, "%s_p%s-%s.png" % (each_path[:-4], i, xref)))</code>
Copy after login

This modified code utilizes tqdm for progress bar display and optimizes the image extraction and saving process.

The above is the detailed content of How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?. For more information, please follow other related articles on the PHP Chinese website!

source:php
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template