Gibt es eine Möglichkeit, eine HTML-Datei in Java in eine speicherinterne PDF-Datei zu konvertieren?

Question

Ich habe eine HTML-Datei erhalten und möchte sie in eine speicherinterne PDF-Datei konvertieren. Bei der Konvertierung möchte ich dafür keinen externen Speicherort verwenden. Ich möchte es nur in Erinnerung behalten. Bisher habe ich einige

Java

-Bibliotheken zur Konvertierung ausprobiert, aber sie erstellen immer irgendwo eine temporäre Datei und lesen/schreiben dann daraus. Ich möchte während der Konvertierung keine E/A-Vorgänge ausführen.

P粉308783585 · Answer

HTMLWorker 类多年前已被弃用。 HTMLWorker 的目标是将小而简单的 HTML 片段转换为 iText 对象。它从来没有打算将完整的 HTML 页面转换为 PDF，但这就是许多开发人员尝试使用它的方式。这导致了很多挫败感，因为 HTMLWorker 不支持所有 HTML 标签，不解析 CSS 文件等等。为了避免这种挫败感，HTMLWorker 已从最新版本的 iText 中删除。

2011 年，iText Group 发布了 XML Worker 作为通用 XML 到 PDF 工具，构建于 iText 5 之上。默认实现将 XHTML（数据）和 CSS（样式）转换为 PDF，映射 HTML 标签，例如

, ，和

to iText 5 objects such as Paragraph, Image，和 ListItem. We don't know of any implementations that used XML Worker for any other XML formats, but many developers used XML Worker in combination with jsoup as an HTML2PDF converter.

XML Worker wasn't a URL2PDF tool though. XML Worker expected predictable HTML created for the sole purpose of converting that HTML to PDF. A common use case was the creation of invoices. Rather than programming the design of an invoice in Java or C#, developers chose to create a simple HTML template defining the structure of the document，和 some CSS defining the styles. They then populated the HTML with data，和 used XML Worker to create the invoices as PDF documents, throwing away the original HTML. We'll take a closer look at this use case in chapter 4, converting XML to HTML in memory using XSLT, then converting that HTML to PDF using the pdfHTML add-on.

iText 5 最初创建时，它被设计为一种尽可能快地生成 PDF 的工具，一旦页面完成就将其刷新到 OutputStream。 2000 年 iText 首次发布时，一些非常有意义的设计选择在 16 年后仍然出现在 iText 5 中。不幸的是，其中一些选择使得将 XML Worker 的功能扩展至许多开发人员期望的质量水平变得非常困难（如果不是不可能的话）。如果我们真的想创建一个出色的 HTML 到 PDF 转换器，我们就必须从头开始重写 iText。我们做到了。

2016 年，我们发布了 iText 7，这是 iText 的全新版本，不再与以前的版本兼容，但在创建时考虑了 pdfHTML。新的渲染器框架花费了大量的工作。当使用 iText 7 创建文档时，会构建渲染器及其子渲染器树。布局是通过遍历该树创建的，这种方法更适合处理 HTML 到 PDF 的转换。 iText 对象经过完全重新设计，以更好地匹配 HTML 标签并允许“CSS 方式”设置样式。

For instance: in iText 5, you had a PdfPTable and a PdfPCell object to create a table and its cells. If you wanted every cell to contain text in a font different from the default font, you needed to set that font for the content of every separate cell. In iText 7, you have a Table and Cell object，和 when you set a different font for the complete table, this font is inherited as the default font for every cell. That was a major step forward in terms of architectural design, especially if the goal is to convert HTML to PDF.

But let's not dwell on the past, let's see what pdfHTML can do for us. In the first chapter, we'll take a look at different variations of the convertToPdf()/ConvertToPdf() method，和 we'll discover how the converter is configured.