Home > Backend Development > PHP Tutorial > Writing Async Libraries - Let's Convert HTML to PDF

Writing Async Libraries - Let's Convert HTML to PDF

William Shakespeare
Release: 2025-02-10 15:51:11
Original
698 people have browsed it

Writing Async Libraries - Let's Convert HTML to PDF

Key Points

  • PHP asynchronous programming, such as HTML to PDF, allows non-blocking operations to improve performance by executing other code simultaneously.
  • Using Promise and callbacks in an asynchronous framework can simplify delayed operations and potential error handling, making the code more robust and easier to maintain.
  • Developing a custom asynchronous library (such as the HTML to PDF converter discussed in this article) involves creating abstractions, using tools such as ReactPHP and Amp to effectively manage asynchronous tasks.
  • Asynchronous code can adapt to synchronous execution, ensuring compatibility and flexibility between different application architectures without sacrificing the advantages of asynchronous programming.
  • By abstracting parallel execution logic into a common driver system, multiple frameworks and environments can be supported, which can interface with various asynchronous libraries.
  • This article explains the actual implementation of asynchronous HTML to PDF conversion in PHP, and emphasizes the importance of understanding and utilizing modern programming paradigms for efficient application development.

This article was peer-reviewed by Thomas Punt. Thanks to all the peer reviewers of SitePoint for getting SitePoint content to its best!


The topic of PHP asynchronous programming is discussed almost every meeting. I'm glad it's mentioned so often now. However, these speakers did not reveal a secret...

Creating an asynchronous server, resolving domain names, and interacting with the file system: these are all simple things. Creating your own asynchronous library is difficult. And that's exactly where you spend most of your time!

Writing Async Libraries - Let's Convert HTML to PDF

These simple things are simple because they are proof of concept - making asynchronous PHP compete with NodeJS. You can see how similar their early interfaces were:

var http = require("http");
var server = http.createServer();

server.on("request", function(request, response) {
    response.writeHead(200, {
        "Content-Type": "text/plain"
    });

    response.end("Hello World");
});

server.listen(3000, "127.0.0.1");
Copy after login
Copy after login
Copy after login
Copy after login

This code is tested using Node 7.3.0

require "vendor/autoload.php";

$loop = React\EventLoop\Factory::create();
$socket = new React\Socket\Server($loop);
$server = new React\Http\Server($socket);

$server->on("request", function($request, $response) {
    $response->writeHead(200, [
        "Content-Type" => "text/plain"
    ]);

    $response->end("Hello world");
});

$socket->listen(3000, "127.0.0.1");
$loop->run();
Copy after login
Copy after login
Copy after login
Copy after login

This code is tested using PHP 7.1 and react/http:0.4.2

Today, we will look at some methods to make your application code run well in an asynchronous architecture. Don't worry - your code can still work in a synchronous architecture, so you don't have to give up anything to learn this new skill. In addition to spending some time...

You can find the code for this tutorial on Github. I've tested it with PHP 7.1 and the latest versions of ReactPHP and Amp.

Hopeful Theory

Asynchronous code has some common abstractions. We've seen one of them: callbacks. Callbacks, as the name implies, describe how they handle slow or blocking operations. The synchronization code is full of waiting. Ask for something and wait for something to happen.

Therefore, asynchronous frameworks and libraries can use callbacks. Request something, when it happens: the framework or library will call back your code.

In the case of HTTP server, we will not preemptively process all requests. We won't wait for the request to happen, either. We just describe the code that should be called, if the request occurs. The event loop takes care of the rest of the work.

The second common abstraction is Promise. Callbacks are hooks waiting for future events, and Promise is a reference to future values. They look like this:

var http = require("http");
var server = http.createServer();

server.on("request", function(request, response) {
    response.writeHead(200, {
        "Content-Type": "text/plain"
    });

    response.end("Hello World");
});

server.listen(3000, "127.0.0.1");
Copy after login
Copy after login
Copy after login
Copy after login

This has a little more code than using callbacks alone, but it's an interesting way to do it. We wait for something to happen and then do another. If something goes wrong, we will catch the error and respond reasonably. This seems simple, but is not fully discussed.

We are still using callbacks, but we have wrapped them in an abstraction, which helps us in other ways. One benefit is that they allow multiple parsing callbacks...

require "vendor/autoload.php";

$loop = React\EventLoop\Factory::create();
$socket = new React\Socket\Server($loop);
$server = new React\Http\Server($socket);

$server->on("request", function($request, $response) {
    $response->writeHead(200, [
        "Content-Type" => "text/plain"
    ]);

    $response->end("Hello world");
});

$socket->listen(3000, "127.0.0.1");
$loop->run();
Copy after login
Copy after login
Copy after login
Copy after login

I want us to focus on another thing. That is, Promise provides a common language—a common abstraction—to think about how synchronous code becomes asynchronous code.

Let's get some application code and make it asynchronous, use Promise...

Create PDF files

It is common for applications to generate some sort of summary documents—whether it is an invoice or inventory list. Suppose you have an e-commerce application that processes payments via Stripe. When a customer purchases an item, you want them to be able to download a PDF receipt for the transaction.

You can do this in a number of ways, but a very simple way is to generate the document using HTML and CSS. You can convert it to a PDF document and allow customers to download it.

I need to do something similar recently. I found that there are not many good libraries to support this operation. I can't find a single abstraction that allows me to switch between different HTML → PDF engines. So I started building one myself.

I started thinking about what my abstraction needed to do. I chose a very similar interface:

readFile()
    ->then(function(string $content) {
        print "content: " . $content;
    })
    ->catch(function(Exception $e) {
        print "error: " . $e->getMessage();
    });
Copy after login
Copy after login
Copy after login

For simplicity, I hope that all methods except the render method can act as getters and setters. Given this set of expected methods, the next thing to do is create an implementation, using a possible engine. I added the domPDF to my project and started using it:

$promise = readFile();
$promise->then(...)->catch(...);

// ...让我们向现有代码添加日志记录

$promise->then(function(string $content) use ($logger) {
    $logger->info("file was read");
});
Copy after login
Copy after login
Copy after login

I won't go into details on how to use domPDF. I think the documentation is done well enough so that I can focus on the async part of this implementation.

We will check out the data and parallel methods later. The important thing about this Driver implementation is that it collects data (if set, otherwise the default value) and custom options together. It passes these to the callbacks we want to run asynchronously.

domPDF is not an asynchronous library, converting HTML to PDF is a very slow process. So how do we make it asynchronous? Well, we could write a completely asynchronous converter, or we could use an existing synchronous converter; but run it in a parallel thread or process.

This is what I did for the parallel method:

var http = require("http");
var server = http.createServer();

server.on("request", function(request, response) {
    response.writeHead(200, {
        "Content-Type": "text/plain"
    });

    response.end("Hello World");
});

server.listen(3000, "127.0.0.1");
Copy after login
Copy after login
Copy after login
Copy after login

Here I implemented the getter-setter method and thought I could reuse them for the next implementation. The data method acts as a shortcut to collect various document attributes into an array, making them easier to pass to anonymous functions.

parallel method starts to get interesting:

require "vendor/autoload.php";

$loop = React\EventLoop\Factory::create();
$socket = new React\Socket\Server($loop);
$server = new React\Http\Server($socket);

$server->on("request", function($request, $response) {
    $response->writeHead(200, [
        "Content-Type" => "text/plain"
    ]);

    $response->end("Hello world");
});

$socket->listen(3000, "127.0.0.1");
$loop->run();
Copy after login
Copy after login
Copy after login
Copy after login

I really like the Amp project. It is a collection of libraries that support asynchronous architectures, and they are key proponents of the async-interop project.

One of their libraries is called amphp/parallel, which supports multi-threaded and multi-process code (extended via Pthreads and Process Control). These spawn methods return Amp's Promise implementation. This means that the render method can be used like any other method that returns a Promise:

readFile()
    ->then(function(string $content) {
        print "content: " . $content;
    })
    ->catch(function(Exception $e) {
        print "error: " . $e->getMessage();
    });
Copy after login
Copy after login
Copy after login

This code is a bit complicated. Amp also provides an event loop implementation and all auxiliary code to be able to convert a normal PHP generator into coroutines and promises. You can read in another post I wrote how this is even possible and how it relates to PHP's generator.

The returned Promise is also being standardized. Amp returns the implementation of the Promise specification. It's slightly different from the code I've shown above, but still executes the same function.

The generator works like a coroutine in a language with coroutines. Coroutines are functions that can be interrupted, meaning they can be used to perform short-term operations and then pause while waiting for something. During pause, other functions can use system resources.

Actually, this looks like this:

$promise = readFile();
$promise->then(...)->catch(...);

// ...让我们向现有代码添加日志记录

$promise->then(function(string $content) use ($logger) {
    $logger->info("file was read");
});
Copy after login
Copy after login
Copy after login

This seems much more complicated than just writing synchronous code at the beginning. But what it allows is that something else can happen when we wait for funcReturnsPromise to complete.

Generating Promise is exactly what we call abstraction. It provides us with a framework through which we can create functions that return Promise. The code can interact with these promises in a predictable and understandable way.

Look at what it looks like to render PDF documents using our driver:

interface Driver
{
    public function html($html = null);
    public function size($size = null);
    public function orientation($orientation = null);
    public function dpi($dpi = null);
    public function render();
}
Copy after login

This is not as useful as generating PDFs in an asynchronous HTTP server. There is an Amp library called Aerys which makes creating these types of servers easier. Using Aerys, you can create the following HTTP server code:

class DomDriver extends BaseDriver implements Driver
{
    private $options;

    public function __construct(array $options = [])
    {
        $this->options = $options;
    }

    public function render()
    {
        $data = $this->data();
        $custom = $this->options;

        return $this->parallel(
            function() use ($data, $custom) {
                $options = new Options();

                $options->set(
                    "isJavascriptEnabled", true
                );

                $options->set(
                    "isHtml5ParserEnabled", true
                );

                $options->set("dpi", $data["dpi"]);

                foreach ($custom as $key => $value) {
                    $options->set($key, $value);
                }

                $engine = new Dompdf($options);

                $engine->setPaper(
                    $data["size"], $data["orientation"]
                );

                $engine->loadHtml($data["html"]);
                $engine->render();

                return $engine->output();
            }
        );
    }
}
Copy after login

Similarly, I will not go into Aerys in detail now. This is an impressive software that is well worth having its own article. You don't need to understand how Aerys works to see how natural our converter code looks next to it.

My boss said "Don't use asynchronous!"

If you are not sure how long it will take to build an asynchronous application, why does it take so much effort? Writing this code allows us to gain insight into new programming paradigms. And, just because we are writing this code asynchronous, doesn't mean it won't work in a synchronous environment.

To use this code in a synchronous application, we just need to move some asynchronous code inside:

abstract class BaseDriver implements Driver
{
    protected $html = "";
    protected $size = "A4";
    protected $orientation = "portrait";
    protected $dpi = 300;

    public function html($body = null)
    {
        return $this->access("html", $html);
    }

    private function access($key, $value = null)
    {
        if (is_null($value)) {
            return $this->$key;
        }

        $this->$key = $value;
        return $this;
    }

    public function size($size = null)
    {
        return $this->access("size", $size);
    }

    public function orientation($orientation = null)
    {
        return $this->access("orientation", $orientation);
    }

    public function dpi($dpi = null)
    {
        return $this->access("dpi", $dpi);
    }

    protected function data()
    {
        return [
            "html" => $html,
            "size" => $this->size,
            "orientation" => $this->orientation,
            "dpi" => $this->dpi,
        ];
    }

    protected function parallel(Closure $deferred)
    {
        // TODO
    }
}
Copy after login

With this decorator we can write code that looks like a synchronous code:

var http = require("http");
var server = http.createServer();

server.on("request", function(request, response) {
    response.writeHead(200, {
        "Content-Type": "text/plain"
    });

    response.end("Hello World");
});

server.listen(3000, "127.0.0.1");
Copy after login
Copy after login
Copy after login
Copy after login

It still runs the code asynchronously (at least in the background), but all of this is not exposed to the consumer. You can use it in a sync application and you will never know what's going on behind the scenes.

Support other frameworks

Amp has some specific requirements that make it unsuitable for all environments. For example, the basic Amp (event loop) library requires PHP 7.0. The parallel library requires a Pthreads extension or a Process Control extension.

I don't want to impose these restrictions on everyone and want to know how I can support a wider system. The answer is to abstract the parallel execution code into another driver system:

require "vendor/autoload.php";

$loop = React\EventLoop\Factory::create();
$socket = new React\Socket\Server($loop);
$server = new React\Http\Server($socket);

$server->on("request", function($request, $response) {
    $response->writeHead(200, [
        "Content-Type" => "text/plain"
    ]);

    $response->end("Hello world");
});

$socket->listen(3000, "127.0.0.1");
$loop->run();
Copy after login
Copy after login
Copy after login
Copy after login

I can implement it for Amp as well (less restricted, but older) ReactPHP:

readFile()
    ->then(function(string $content) {
        print "content: " . $content;
    })
    ->catch(function(Exception $e) {
        print "error: " . $e->getMessage();
    });
Copy after login
Copy after login
Copy after login

I am used to passing closures to multi-threaded and multi-process worker, because that's how Pthreads and Process Control work. Using ReactPHP Process objects is completely different because they rely on exec for multi-process execution. I decided to implement the same closure function that I am used to using. This is not necessary for asynchronous code - it's purely a matter of taste.

SuperClosure library serializes closures and their bound variables. Most of the code here is the code you expect to find in the worker script. In fact, the only way to use ReactPHP's child process library (besides serializing closures) is to send tasks to worker scripts.

Now, we no longer load our drivers with $this->parallel and Amp specific code, but can pass the run program implementation. As async code, this is similar to:

$promise = readFile();
$promise->then(...)->catch(...);

// ...让我们向现有代码添加日志记录

$promise->then(function(string $content) use ($logger) {
    $logger->info("file was read");
});
Copy after login
Copy after login
Copy after login

Don't be shocked by the difference between ReactPHP code and Amp code. ReactPHP does not implement the same coroutine base as Amp. Instead, ReactPHP prefers to use callbacks to handle most things. This code still just runs the PDF conversion in parallel and returns the generated PDF data.

By running the program in abstract, we can use any asynchronous framework we want, and we can expect the driver we will use to return the abstraction of that framework.

Can I use this?

Initially it was just an experiment, and it became an HTML→PDF library with multiple drivers and multiple running programs; it was called Paper. It's like the Flysystem equivalent of HTML → PDF, but it's also a great example of how to write an asynchronous library.

When you try to make an asynchronous PHP application, you will find gaps in the library ecosystem. Don't be intimidated by these! Instead, take the opportunity to think about how you will use the abstractions provided by ReactPHP and Amp to make your own asynchronous libraries.

Have you built an interesting asynchronous PHP application or library recently? Please let us know in the comments.

FAQ on Asynchronous Converting HTML to PDF

What is the meaning of asynchronous conversion of HTML to PDF?

Asynchronous programming plays a crucial role in converting HTML to PDF. It allows non-blocking operations to be performed, which means the engine is running in the background, allowing the rest of your code to continue execution when the asynchronous operation is completed. This leads to more efficient use of resources and improved performance, especially in applications involving a large number of I/O operations, such as converting HTML to PDF.

How does ReactPHP help in creating asynchronous libraries?

ReactPHP is a low-level library for event-driven programming in PHP. It provides the core infrastructure for creating asynchronous libraries in PHP. With ReactPHP, you can write non-blocking code using PHP's familiar syntax, making it easier to create high-performance applications.

What are the steps involved in asynchronous conversion of HTML to PDF?

The process of asynchronous conversion of HTML to PDF involves several steps. First, you need to set up an HTML template that defines the structure and content of the PDF. Next, you use asynchronous libraries like ReactPHP to handle the conversion process. This includes reading the HTML file, converting it to a PDF, and then saving the generated PDF file. The asynchronous nature of this process means that your application can continue to perform other tasks while the transformation is in progress.

Can I program asynchronously using a language other than PHP?

Yes, you can program asynchronously in other languages. For example, Node.js is a popular choice for building asynchronous applications due to its event-driven architecture. However, if you are already familiar with PHP, libraries like ReactPHP allow you to easily take advantage of asynchronous programming without having to learn new languages.

How to handle errors during asynchronous conversion of HTML to PDF?

Error handling is an important aspect of asynchronous programming. In ReactPHP, you can handle errors by attaching an error event handler to a Promise object. If an error occurs during the conversion process, this handler will be called, allowing you to log the error or take other appropriate actions.

What are the benefits of converting HTML to PDF?

There are many benefits to converting HTML to PDF. It allows you to create a static, portable version of a web page that can be viewed offline, printed, or shared easily. The PDF also retains the format and layout of the original HTML, ensuring that the content looks the same regardless of the device or platform viewed on.

How to optimize the performance of my asynchronous PHP application?

There are several ways to optimize the performance of an asynchronous PHP application. One approach is to use libraries like ReactPHP, which provides a low-level interface for event-driven programming. This allows you to write non-blocking code, which can significantly improve the performance of I/O-intensive operations such as converting HTML to PDF.

Can I convert HTML to PDF synchronously?

Yes, HTML can be converted to PDF synchronously. However, this approach may block your application's execution until the conversion process is complete, which can cause performance issues for I/O-intensive applications. On the other hand, asynchronous conversion allows your application to continue performing other tasks while the conversion is in progress, resulting in better performance and resource utilization.

What are the challenges of asynchronous programming in PHP?

Asynchronous programming in PHP can be challenging due to the synchronization characteristics of PHP. However, libraries like ReactPHP provide the architecture required to write non-blocking code in PHP. Understanding event-driven programming models and mastering the use of Promise can also be challenging, but they are key to leveraging the advantages of asynchronous programming.

How to test the performance of an asynchronous PHP application?

Testing the performance of an asynchronous PHP application includes measuring key metrics under different load conditions such as response time, memory usage, and CPU utilization. Tools like Apache JMeter or Siege can be used to simulate load on an application and collect performance data. In addition, analysis tools like Xdebug can help you identify bottlenecks in your code and optimize their performance.

The above is the detailed content of Writing Async Libraries - Let's Convert HTML to PDF. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template