Home > Backend Development > C++ > How to Convert HTML to PDF Using iTextSharp: HTMLWorker vs. XMLWorker?

How to Convert HTML to PDF Using iTextSharp: HTMLWorker vs. XMLWorker?

Linda Hamilton
Release: 2025-01-27 03:11:12
Original
139 people have browsed it

How to Convert HTML to PDF Using iTextSharp: HTMLWorker vs. XMLWorker?

iTextSharp: Efficiently Converting HTML to PDF

Converting HTML documents to PDF format using iTextSharp requires a structured approach. It's crucial to remember that HTML and PDF are distinct formats, necessitating careful handling during the conversion process.

Understanding iTextSharp's HTML Handling

iTextSharp possesses the capability to parse HTML and CSS, but it lacks support for frameworks like ASP.NET, MVC, or Razor. You are responsible for extracting the HTML content from your chosen framework; iTextSharp doesn't offer this functionality.

Parser Selection: HTMLWorker vs. XMLWorker

iTextSharp provides two options for HTML tag parsing: HTMLWorker and XMLWorker. While HTMLWorker was previously used, XMLWorker is now the recommended parser. XMLWorker boasts enhanced extensibility and superior CSS support.

Code Example: HTML Tag Parsing with HTMLWorker and XMLWorker

The following C# code snippets illustrate how to parse HTML tags using both methods:

<code class="language-csharp">// Example HTML
string html = "...";

// Parsing with HTMLWorker (CSS ignored)
using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc))
{
    using (var sr = new StringReader(html))
    {
        htmlWorker.Parse(sr);
    }
}

// Parsing with XMLWorker (CSS supported)
using (var srHtml = new StringReader(html))
{
    iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
}</code>
Copy after login

Leveraging XMLWorker for CSS Support

XMLWorker allows for seamless integration of CSS stylesheets. The following example demonstrates this:

<code class="language-csharp">string css = "...";

// Convert CSS and HTML strings to memory streams
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(css)))
using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
    iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
}</code>
Copy after login

Important Note: iTextSharp's support for HTML and CSS features is not exhaustive. Consult the official iTextSharp documentation for comprehensive details on supported features and limitations.

The above is the detailed content of How to Convert HTML to PDF Using iTextSharp: HTMLWorker vs. XMLWorker?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template