Extracting Native Resolution Images from PDFs in Python without Resampling
Extracting images from PDFs with their native resolution and format while preserving the layout can be a challenge. However, Python's PyMuPDF module provides a straightforward solution.
Using PyMuPDF
PyMuPDF can output images as PNG files, ensuring high resolution and maintaining the original format (e.g., TIFF, JPEG). The following code demonstrates its usage:
<code class="python">import fitz doc = fitz.open("file.pdf") for i in range(len(doc)): for img in doc.getPageImageList(i): xref = img[0] pix = fitz.Pixmap(doc, xref) if pix.n < 5: # GRAY or RGB pix.writePNG("p%s-%s.png" % (i, xref)) else: # CMYK pix1 = fitz.Pixmap(fitz.csRGB, pix) pix1.writePNG("p%s-%s.png" % (i, xref)) pix1 = None pix = None</code>
Modified Version for fitz 1.19.6
For the latest version of fitz (1.19.6), the following modified code can be used:
<code class="python">import os import fitz from tqdm import tqdm workdir = "your_folder" for each_path in os.listdir(workdir): if ".pdf" in each_path: doc = fitz.Document((os.path.join(workdir, each_path))) for i in tqdm(range(len(doc)), desc="pages"): for img in tqdm(doc.get_page_images(i), desc="page_images"): xref = img[0] image = doc.extract_image(xref) pix = fitz.Pixmap(doc, xref) pix.save(os.path.join(workdir, "%s_p%s-%s.png" % (each_path[:-4], i, xref)))</code>
This modified code utilizes tqdm for progress bar display and optimizes the image extraction and saving process.
The above is the detailed content of How to Extract High-Resolution Images from PDFs Using Python without Altering Dimensions?. For more information, please follow other related articles on the PHP Chinese website!