如何在 C# 或 VB.NET 中使用 iTextSharp 高效提取 PDF 內容？-C++-PHP中文網

如何在 C# 或 VB.NET 中使用 iTextSharp 高效提取 PDF 內容？

Barbara Streisand

發布： 2025-01-06 07:46:40

原創

1053 人瀏覽過

How to Extract PDF Content Efficiently using iTextSharp in C# or VB.NET?

使用iTextSharp 擷取PDF 內容

問題：

如何使用iTextSharp 有效擷取PDF 文件的內容C#？

答案：

iTextSharp 提供了一種透過其 PdfReader 類別讀取 PDF 內容的可靠機制。以下是從PDF 文件中提取文字和圖像的全面C# 解決方案：

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System;
using System.IO;
using System.Text;

namespace PdfContentReader
{
    public static class Program
    {
        public static string ReadPdfFile(string fileName)
        {
            StringBuilder text = new StringBuilder();

            if (File.Exists(fileName))
            {
                PdfReader pdfReader = new PdfReader(fileName);

                for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                    string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                    currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                    text.Append(currentText);
                }
                pdfReader.Close();
            }
            return text.ToString();
        }

        public static void Main(string[] args)
        {
            string fileName = @"path\to\file.pdf";
            string extractedText = ReadPdfFile(fileName);

            Console.WriteLine(extractedText);
        }
    }
}

登入後複製

在此實作中：