Home > Backend Development > C++ > How Can I Extract Text Formatting Information (Font, Size, etc.) Using iTextSharp?

How Can I Extract Text Formatting Information (Font, Size, etc.) Using iTextSharp?

Barbara Streisand
Release: 2025-01-11 09:42:46
Original
193 people have browsed it

How Can I Extract Text Formatting Information (Font, Size, etc.) Using iTextSharp?

Use iTextSharp to extract PDF text format information (font, size, etc.)

iTextSharp library can extract PDF text and its formatting information, such as font and font size. Here is an example of how to use TextWithFontExtractionStrategy to achieve this functionality:

PdfReader reader = new PdfReader(System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Document.pdf"));
TextWithFontExtractionStrategy strategy = new TextWithFontExtractionStrategy();
string text = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, 1, strategy);
Console.WriteLine(text);
Copy after login

TextWithFontExtractionStrategyUse the TextRenderInfo object to extract text format information from PDF content. The TextRenderInfo object contains attributes such as GetFont, GetFontName, GetFontSize, GetBaseline, and GetAscentLine.

You can use these properties to get the font family, font size, and baseline position of the text. Here's an example of how to use these properties to extract text formatting information:

// 获取字体系列
string fontFamily = renderInfo.GetFont().PostscriptFontName;

// 获取字号
float fontSize = renderInfo.GetBaseline().GetEndPoint()[Vector.I2] - renderInfo.GetBaseline().GetStartPoint()[Vector.I2];

// 获取基线位置
Vector baseline = renderInfo.GetBaseline().GetStartPoint();
Copy after login

Please note that the renderInfo object needs to be obtained during the processing of the text returned by TextWithFontExtractionStrategy. The complete code needs to include a loop that processes TextRenderInfo objects in order to extract formatting information from each line or text fragment. The above example only shows how to access the properties of the TextRenderInfo object.

The above is the detailed content of How Can I Extract Text Formatting Information (Font, Size, etc.) Using iTextSharp?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template