如何在 C# 或 VB.NET 中使用 iTextSharp 高效提取 PDF 内容？-C++-PHP中文网

如何在 C# 或 VB.NET 中使用 iTextSharp 高效提取 PDF 内容？

Barbara Streisand

发布： 2025-01-06 07:46:40

原创

1031 人浏览过

How to Extract PDF Content Efficiently using iTextSharp in C# or VB.NET?

使用 iTextSharp 提取 PDF 内容

问题：

如何使用 iTextSharp 有效检索 PDF 文档的内容VB.NET 或C#？

答案：

iTextSharp 提供了一种通过其 PdfReader 类读取 PDF 内容的可靠机制。下面是从 PDF 文档中提取文本和图像的全面 C# 解决方案：

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System;
using System.IO;
using System.Text;

namespace PdfContentReader
{
    public static class Program
    {
        public static string ReadPdfFile(string fileName)
        {
            StringBuilder text = new StringBuilder();

            if (File.Exists(fileName))
            {
                PdfReader pdfReader = new PdfReader(fileName);

                for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                    string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                    currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                    text.Append(currentText);
                }
                pdfReader.Close();
            }
            return text.ToString();
        }

        public static void Main(string[] args)
        {
            string fileName = @"path\to\file.pdf";
            string extractedText = ReadPdfFile(fileName);

            Console.WriteLine(extractedText);
        }
    }
}

登录后复制

在此实现中：