Using coroutines in PHP to achieve cooperative multitasking Page 1/2

One of the better new features of PHP5.5 is the support for generators and coroutines. Generators are already covered in great detail in PHP's documentation and various other blog posts (like this one or this one). Coroutines have received relatively little attention, so although coroutines have very powerful functions, they are difficult to know and difficult to explain.

This article guides you through the use of coroutines to implement task scheduling, and achieves an understanding of the technology through examples. I will give a brief background introduction in the first three sections. If you already have a good foundation, you can jump directly to the "Collaborative Multitasking" section.

Generator

The most basic idea of a generator is also a function. The return value of this function is output in sequence, rather than just returning a single value. Or, in other words, generators make it easier for you to implement the iterator interface. The following is a simple explanation by implementing an xrange function:

Copy code The code is as follows:

function xrange($start, $end, $step = 1) {
for ($i = $start; $i <= $end; $i += $step) {
yield $i;
}
}

foreach (xrange(1, 1000000) as $num) {
echo $num, "n";
}

The xrange() function above provides the same functionality as PHP’s built-in function range(). But the difference is that the range() function returns an array containing group values from 1 to 1 million (note: please check the manual). The xrange() function returns an iterator that outputs these values in sequence, and it is not actually calculated in the form of an array.

The advantages of this approach are obvious. It allows you to process large data collections without loading them into memory all at once. You can even handle infinitely large data streams.

Of course, this function can also be implemented not through a generator, but by inheriting the Iterator interface. It will be more convenient to implement it by using a generator, instead of having to implement the five methods in the iterator interface.

Generators are interruptible functions
To understand coroutines from generators, it is very important to understand how they work internally: generators are interruptible functions, and in them, yield constitutes an interrupt point.

Following the above example, if you call xrange(1,1000000), the code in the xrange() function does not actually run. Instead, PHP just returns an instance of the generator class that implements the iterator interface:

Copy code The code is as follows:

 
$range = xrange(1, 1000000);
var_dump($range); // object(Generator)#1
var_dump($range instanceof Iterator); // bool(true)

You call the iterator method once on an object, and the code in it runs once. For example, if you call $range->rewind(), then the code in xrange() runs to the control flow where yield occurs for the first time. In this case, this means that yield $i only runs when $i=$start. The value passed to the yield statement is obtained using $range->current().

In order to continue executing the code in the generator, you must call the $range->next() method. This will start the generator again until the yield statement appears. Therefore, by calling the next() and current() methods successively, you will get all the values from the generator, until at some point there are no more yield statements. For xrange(), this situation occurs when $i exceeds $end. In this case, control flow will reach the end of the function, so no code will be executed. Once this happens, the void() method will return false and the iteration ends.

Coroutine

The main thing that coroutines add to the above functionality is the ability to send data back to the generator. This will turn the one-way communication from the generator to the caller into a two-way communication between the two.
Pass data to the coroutine by calling the generator's send() method instead of its next() method. The following logger() coroutine is an example of how this communication works:

Copy code The code is as follows:

function logger($fileName) {
$fileHandle = fopen($fileName, 'a');
while (true) {
fwrite($fileHandle, yield . "n");
}
}

$logger = logger(__DIR__ . '/log');
$logger->send('Foo');
$logger->send('Bar')

As you can see, yield is not used as a statement here, but as an expression. That is, it has a return value. The return value of yield is the value passed to the send() method. In this example, yield will first return "Foo", then "Bar".

In the above example, yield only serves as the receiver. It is possible to mix the two usages, i.e. both receive and send. An example of how receiving and sending communication works is as follows:

Copy code The code is as follows:

function gen() {
$ret = (yield 'yield1');
var_dump($ret);
$ret = (yield 'yield2');
var_dump($ret );
}

$gen = gen();
var_dump($gen->current()); // string(6) "yield1"
var_dump($gen->send('ret1') ); // string(4) "ret1" (the first var_dump in gen)
gen->send('ret2')); // string(4) "ret2" (again from within gen)
>

It's a little hard to understand the exact order of output right away, so make sure you know why it's output the way it is. There are two points I would like to point out in particular: First, the use of parentheses around the yield expression is no accident. For technical reasons (although I've considered adding an exception for assignment, like Python does), the parentheses are necessary. Second, you may have noticed that rewind() is not called before current() is called. If this is done, then the rewind operation has been implicitly performed.

Multi-tasking collaboration

If you read the above logger() example, then you think "Why should I use coroutines for two-way communication? Why can't I just use common classes?", you are absolutely right to ask. The above example demonstrates basic usage, but the context does not really show the advantages of using coroutines. This is the reason why there are many examples of coroutines. As mentioned in the introduction above, coroutines are a very powerful concept, but such applications are rare and often very complex. It is difficult to give some simple and real examples.

In this article, what I decided to do is to use coroutines to achieve multi-task collaboration. The problem we're trying to solve is that you want to run multiple tasks (or "programs") concurrently. However, the processor can only run one task at a time (the goal of this article is not to consider multi-core). So the processor needs to switch between different tasks, always letting each task run for a "little while."

The "collaboration" in the term multitasking collaboration explains how this switch is performed: it requires the currently running task to automatically pass control back to the scheduler so that it can run other tasks. This is in contrast to "preemptive" multitasking, where the scheduler can interrupt a task that has been running for a while, whether it likes it or not. Cooperative multitasking was used in early versions of Windows (windows95) and Mac OS, but they later switched to using preemptive multitasking. The reason is pretty clear: if you rely on programs to automatically transfer control back, it's easy for badly behaved software to take up the entire CPU for itself and not share it with other tasks.

At this time you should understand the connection between coroutines and task scheduling: the yield instruction provides a way for the task to interrupt itself, and then transfer control to the scheduler. So coroutines can run multiple other tasks. Furthermore, yield can be used to communicate between tasks and the scheduler.

Our goal is to use a more lightweight wrapper coroutine function for "task":

Copy code

Code As follows:

class Task {
    protected $taskId;
    protected $coroutine;
    protected $sendValue = null;
    protected $beforeFirstYield = true;

    public function __construct($taskId, Generator $coroutine) {
        $this->taskId = $taskId;
        $this->coroutine = $coroutine;
    }

    public function getTaskId() {
        return $this->taskId;
    }

    public function setSendValue($sendValue) {
        $this->sendValue = $sendValue;
    }

    public function run() {
        if ($this->beforeFirstYield) {
            $this->beforeFirstYield = false;
            return $this->coroutine->current();
        } else {
            $retval = $this->coroutine->send($this->sendValue);
            $this->sendValue = null;
            return $retval;
        }
    }

    public function isFinished() {
        return !$this->coroutine->valid();
    }
}

一个任务是用任务ID标记一个协程。使用setSendValue()方法，你可以指定哪些值将被发送到下次的恢复（在之后你会了解到我们需要这个）。 run()函数确实没有做什么，除了调用send()方法的协同程序。要理解为什么添加beforeFirstYieldflag，需要考虑下面的代码片段：

复制代码代码如下:

function gen() {
yield 'foo';
yield 'bar';
}

$gen = gen();
var_dump($gen->send('something'));

// As the send() happens before the first yield there is an implicit rewind() call,
// so what really happens is this:
$gen->rewind();
var_dump($gen->send('something'));

// The rewind() will advance to the first yield (and ignore its value), the send() will
// advance to the second yield (and dump its value). Thus we loose the first yielded value!

通过添加 beforeFirstYieldcondition 我们可以确定 first yield 的值被返回。

调度器现在不得不比多任务循环要做稍微多点了，然后才运行多任务：

复制代码代码如下:

class Scheduler {
    protected $maxTaskId = 0;
    protected $taskMap = []; // taskId => task
    protected $taskQueue;

    public function __construct() {
        $this->taskQueue = new SplQueue();
    }

    public function newTask(Generator $coroutine) {
        $tid = ++$this->maxTaskId;
        $task = new Task($tid, $coroutine);
        $this->taskMap[$tid] = $task;
        $this->schedule($task);
        return $tid;
    }

    public function schedule(Task $task) {
        $this->taskQueue->enqueue($task);
    }

    public function run() {
        while (!$this->taskQueue->isEmpty()) {
            $task = $this->taskQueue->dequeue();
            $task->run();

            if ($task->isFinished()) {
                unset($this->taskMap[$task->getTaskId()]);
            } else {
                $this->schedule($task);
            }
        }
    }
}

newTask()方法（使用下一个空闲的任务id）创建一个新任务，然后把这个任务放入任务映射数组里。接着它通过把任务放入任务队列里来实现对任务的调度。接着run()方法扫描任务队列，运行任务。如果一个任务结束了，那么它将从队列里删除，否则它将在队列的末尾再次被调度。
让我们看看下面具有两个简单（并且没有什么意义）任务的调度器：

复制代码代码如下:

function task1() {
    for ($i = 1; $i <= 10; ++$i) {
        echo "This is task 1 iteration $i.n";
        yield;
    }
}

function task2() {
    for ($i = 1; $i <= 5; ++$i) {
        echo "This is task 2 iteration $i.n";
        yield;
    }
}

$scheduler = new Scheduler;

$scheduler->newTask(task1());
$scheduler->newTask(task2());

$scheduler->run();

两个任务都仅仅回显一条信息，然后使用yield把控制回传给调度器。输出结果如下：

复制代码代码如下:

 This is task 1 iteration 1.
This is task 2 iteration 1.
This is task 1 iteration 2.
This is task 2 iteration 2.
This is task 1 iteration 3.
This is task 2 iteration 3.
This is task 1 iteration 4.
This is task 2 iteration 4.
This is task 1 iteration 5.
This is task 2 iteration 5.
This is task 1 iteration 6.
This is task 1 iteration 7.
This is task 1 iteration 8.
This is task 1 iteration 9.
This is task 1 iteration 10.
 

The output is exactly what we expected: for the first five iterations, the two tasks are run alternately, and then after the second task ends, only the first task continues to run.

Communication with the scheduler

Now that the scheduler is running, let’s move on to the next item on the schedule: communication between tasks and the scheduler. We'll communicate using the same method that processes use to talk to the operating system: system calls. The reason we need system calls is that the operating system is on a different permission level than the process. So in order to perform a privileged level operation (such as killing another process), you have to somehow pass control back to the kernel so that the kernel can perform said operation. Again, this behavior is achieved internally through the use of interrupt instructions. In the past, the general int instruction was used, but now the more specific and faster syscall/sysenter instructions are used.

Our task scheduling system will reflect this design: instead of simply passing the scheduler to the task (thus allowing it to do whatever it wants), we will interact with the system call by passing the information to the yield expression communication. Yield here is an interrupt and a method of transmitting information to the scheduler (and passing information from the scheduler).

To illustrate the system call, I will make a small encapsulation of the callable system call:

Copy the code The code is as follows:

class SystemCall {
protected $callback;

public function __construct(callable $callback) {
$this->callback = $callback;
}

public function __invoke(Task $task, Scheduler $scheduler) {
$callback = $this->callback; // Can't call it directly in PHP :/
return $callback($task , $scheduler);
}
}

It will run like any other callable (using _invoke), but it requires the scheduler to pass the calling task and itself to this function. In order to solve this problem, we have to slightly modify the run method of the scheduler:

Copy code The code is as follows:

public function run() {
while ( !$this->taskQueue->isEmpty()) {
$task = $this->taskQueue->dequeue();
$retval = $task->run();

if ($retval instanceof SystemCall) {
$retval($task, $this);
continue;
}

if ($task->isFinished()) {
unset($this->taskMap[$task->getTaskId()]);
$this- >schedule($task);
}
}
}

The first system call does nothing but return the task ID:

Copy code The code is as follows:

function getTaskId() {
 return new SystemCall (function(Task $task, Scheduler $scheduler) {
 $task->setSendValue($task->getTaskId());
 $scheduler->schedule($task);
 } );
}

This function does set the task id to the value sent next time and schedules the task again. Due to the use of system calls, the scheduler cannot automatically call the task, we need to manually schedule the task (you will understand why later). To use this new system call, we need to rewrite the previous example:

Copy code The code is as follows:

function task($max) {
$tid = (yield getTaskId()); // <-- here's the syscall!
for ($i = 1; $i <= $max ; ++$i) {
echo "This is task $tid iteration $i.n";
$scheduler = new Scheduler;

$scheduler->newTask(task(10));$scheduler->newTask(task(5));

$scheduler->run();

This code will give the same output as the previous example. Note that the system call runs normally like any other call, but with the yield added beforehand. To create new tasks and then kill them, more than two system calls are required:

Copy code

The code is as follows:

function newTask(Generator $coroutine) {

return new SystemCall(
function(Task $task, Scheduler $scheduler) use ($coroutine) {

$task->setSendValue($scheduler- >newTask($coroutine));

function killTask($tid) {
return new SystemCall(
function(Task $task, Scheduler $scheduler) use ($tid) {
$task->setSendValue($scheduler-> ;killTask($tid));
$scheduler->schedule($task);

The killTask function needs to add a method in the scheduler:

Copy code

The code is as follows:

public function killTask($tid) {

if (!isset($this->taskMap[$tid])) {

return false;

}

unset($this->taskMap[$tid]); // This is a bit ugly and could be optimized so it does not have to walk the queue, // but assuming that killing tasks is rather rare I won't bother with it now

foreach ( $this->taskQueue as $i => $task) {

use using using if ($task->getTaskId() === $tid) ]);
break; }

}

return true;
}

http://www.bkjia.com/PHPjc/328024.html

www.bkjia.com

true

http: //www.bkjia.com/PHPjc/328024.html

TechArticle

A better new feature of PHP5.5 is the support for generators and coroutines. For generators, PHP's documentation and various other blog posts (like this one or this one) already have...

Using coroutines in PHP to achieve cooperative multitasking Page 1/2_PHP Tutorial