What should you do if a feature in your web application takes more than 1 or 2 seconds to complete? Some kind of offline processing solution is needed. Learn several ways to offline serve long-running jobs in PHP applications.
Large chain stores have a big problem. Every day, thousands of transactions occur in every store. Company executives want to mine this data. Which products sell well? What's bad? Where do organic products sell well? How are ice cream sales going?
In order to capture this data, organizations must load all transactional data into a data model that is more suitable for generating the types of reports the company requires. However, this takes time, and as the chain grows, it can take more than a day to process a day's worth of data. So, this is a big problem.
Now, your web application may not need to process this much data, but there is a chance that any site will take longer to process than your customers are willing to wait. Generally speaking, the time that customers are willing to wait is 200 milliseconds. If it exceeds this time, customers will feel that the process is "slow".This number is based on desktop applications, while the Web makes us more patient. But no matter what, you shouldn't make your customers wait longer than a few seconds. Therefore, some strategies should be adopted to handle batch jobs in PHP.
Decentralized approach with cron
On UNIX® machines, the core program that performs batch processing is the cron daemon. The daemon reads a configuration file that tells it which command lines to run and how often. The daemon then executes them as configured. When an error is encountered, it can even send error output to a specified email address to help debug the problem.
I know some engineers who strongly advocate the use of threading technology. "Threads! Threads are the real way to do background processing. The cron daemon is so outdated."
I don't think so.
I have used both methods. I think cron has the advantage of the "Keep It Simple, Stupid (KISS, simple is beautiful)" principle. It keeps background processing simple. Instead of writing a multi-threaded job processing application that runs all the time (so there are no memory leaks), cron starts a simple batch script. This script determines whether there is a job to process, executes the job, and then exits. No need to worry about memory leaks. There's also no need to worry about threads stalling or getting stuck in infinite loops.
So, how does cron work? This depends on your system environment. I'll only discuss the old simple UNIX command line version of cron, you can ask your system administrator how to implement it in your own web application.
Here is a simple cron configuration that runs a PHP script at 11pm every night:
0 23 * * * jack /usr/bin/php /users/home/jack/myscript.php
The first 5 field definitions should The time to start the script. Then the username that should be used to run this script. The remaining commands are the command lines to be executed. The time fields are minutes, hours, day of month, month, and day of week. Here are a few examples.
Command:
15 * * * * jack /usr/bin/php /users/home/jack/myscript.php
Run the script at the 15th minute of every hour.
Command:
15,45 * * * * jack /usr/bin/php /users/home/jack/myscript.php
Run the script at the 15th and 45th minute of every hour.
Command:
*/1 3-23 * * * jack /usr/bin/php /users/home/jack/myscript.php
Run the script every minute between 3 am and 11 pm.
Command
30 23 * * 6 jack /usr/bin/php /users/home/jack/myscript.php
Run the script every Saturday at 11:30 pm (Saturday is designated by 6 ).
As you can see, the number of combinations is unlimited. You can control when the script is run as needed. You can also specify multiple scripts to run, so that some scripts can be run every minute, while other scripts (such as backup scripts) can be run only once a day.
To specify which email address to send reported errors to, you can use the MAILTO directive as follows:
MAILTO=jherr@pobox.com
Note: For Microsoft® Windows® users, there is an equivalent Scheduled Tasks system Used to regularly start command line processes (such as PHP scripts).
Back to Top
Basics of Batch Processing Architecture
Batch processing is quite simple. In most cases, one of two workflows is used. The first workflow is for reporting; the script runs once a day, it generates the report and sends it to a group of users. The second workflow is a batch job created in response to some kind of request. For example, I logged into the web application and asked it to send a message to all users registered in the system telling them about a new feature. This operation must be batched because there are 10,000 users in the system. PHP takes a while to complete such a task, so it must be performed by a job outside the browser.
In the second workflow, the web application simply puts the information somewhere and lets the batch application share it. These messages specify the nature of the job (for example, "Send this e-mail to all the people on the system".) batch program runs the job and then deletes the job. Alternatively, the handler marks the job as completed. Regardless of the method used, the job should be recognized as completed so that it is not run again.
The remainder of this article demonstrates various methods of sharing data between a web application frontend and a batch backend.
Back to Top
Mail Queue
The first method is to use a dedicated mail queue system. In this model, a table in the database contains email messages that should be sent to various users. The web interface uses the mailouts class to add emails to the queue. Email handlers use the mailouts class to retrieve unprocessed emails and then use it again to remove unprocessed emails from the queue.
This model first requires MySQL schema.
List 1. mailout.sql
DROP TABLE IF EXISTS mailouts;CREATE TABLE mailouts ( id MEDIUMINT NOT NULL AUTO_INCREMENT, from_address TEXT NOT NULL, to_address TEXT NOT NULL, subject TEXT NOT NULL, content TEXT NOT NULL, PRIMARY KEY ( id )) ;
This mode is very simple. Each line has a from and a to address, as well as the subject and content of the email.
It is the PHP mailouts class that processes the mailouts table in the database.
List 2. mailouts.php
getMessage()); } return $db; } public static function delete( $id ) { $db = Mailouts::get_db(); $sth = $db->prepare( 'DELETE FROM mailouts WHERE id=?' ); $db->execute( $sth, $id ); return true; } public static function add( $from, $to, $subject, $content ) { $db = Mailouts::get_db(); $sth = $db->prepare( 'INSERT INTO mailouts VALUES (null,?,? ,?,?)' ); $db->execute( $sth, array( $from, $to, $subject, $content ) ); return true; } public static function get_all() { $db = Mailouts: :get_db(); $res = $db->query( "SELECT * FROM mailouts" ); $rows = array(); while( $res->fetchInto( $row ) ) { $rows []= $ row; } return $rows; }}?>
This script contains the Pear::DB database access class. Then define the mailouts class, which contains three main static functions: add, delete and get_all. The add() method adds an email to the queue. This method is used by the frontend. The get_all() method returns all data from the table. The delete() method deletes an email.
You may ask, why don't I just call the delete_all() method at the end of the script. There are two reasons for not doing this: if you delete each message after it is sent, it is unlikely that the message will be sent twice even if the script is re-run after the problem occurs; new ones may be added between the start and completion of the batch job information.
The next step is to write a simple test script that adds an entry to the queue.
List 3. mailout_test_add.php
In this example, I add a mailout, this message is to be sent to Molly at a company, including the subject "Test Subject" and the email body. You can run this script on the command line: php mailout_test_add.php.
In order to send emails, another script is required, this script acts as a job handler.
List 4. mailout_send.php
This script retrieves all email messages using the get_all() method, and then sends the messages one by one using PHP’s mail() method. After each successful email is sent, the delete() method is called to delete the corresponding record from the queue.
Use the cron daemon to run this script periodically. How often you run this script depends on the needs of your application.
Note: The PHP Extension and Application Repository (PEAR) repository contains an excellent Mail Queuing System implementation that is free to download.
Back to top
A more general approach
Specialized solutions for sending emails are great, but is there a more general approach? We need to be able to send emails, generate reports, or perform other time-consuming processing without having to wait in the browser for the processing to complete.
For this, you can take advantage of the fact that PHP is an interpreted language. PHP code can be stored in a queue in the database and executed later. This requires two tables, see Listing 5.
List 5. generic.sql
DROP TABLE IF EXISTS processing_items;CREATE TABLE processing_items (id MEDIUMINT NOT NULL AUTO_INCREMENT, function TEXT NOT NULL, PRIMARY KEY (id));DROP TABLE IF EXISTS processing_args;CREATE TABLE processing_args ( id MEDIUMINT NOT NULL AUTO_INCREMENT, item_id MEDIUMINT NOT NULL, key_name TEXT NOT NULL, value TEXT NOT NULL, PRIMARY KEY (id));
The first table processing_items contains the functions called by the job handler. The second table, processing_args , contains the arguments to be sent to the function, in the form of a hash table of key/value pairs.
Like the mailouts table, these two tables are also wrapped by a PHP class called ProcessingItems.
清单 6. generic.php
prepare( 'DELETE FROM processing_args WHERE item_id=?' ); $db->execute( $sth, $id ); $sth = $db->prepare( 'DELETE FROM processing_items WHERE id=?' ); $db->execute( $sth, $id ); return true; } public static function add( $function, $args ) { $db = ProcessingItems::get_db(); $sth = $db->prepare( 'INSERT INTO processing_items VALUES (null,?)' ); $db->execute( $sth, array( $function ) ); $res = $db->query( "SELECT last_insert_id()" ); $id = null; while( $res->fetchInto( $row ) ) { $id = $row[0]; } foreach( $args as $key => $value ) { $sth = $db->prepare( 'INSERT INTO processing_args VALUES (null,?,?,?)' ); $db->execute( $sth, array( $id, $key, $value ) ); } return true; } public static function get_all() { $db = ProcessingItems::get_db(); $res = $db->query( "SELECT * FROM processing_items" ); $rows = array(); while( $res->fetchInto( $row ) ) { $item = array(); $item['id'] = $row[0]; $item['function'] = $row[1]; $item['args'] = array(); $ares = $db->query( "SELECT key_name, value FROM processing_args WHERE item_id=?", $item['id'] ); while( $ares->fetchInto( $arow ) ) $item['args'][ $arow[0] ] = $arow[1]; $rows []= $item; } return $rows; }}?>
这个类包含三个重要的方法:add()、get_all() 和 delete()。与 mailouts 系统一样,前端使用 add(),处理引擎使用 get_all() 和 delete()。
清单 7 所示的测试脚本将一个条目添加到处理队列中。
清单 7. generic_test_add.php
'foo' ) );?>
在这个示例中,添加了一个对 printvalue 函数的调用,并将 value 参数设置为 foo。我使用 PHP 命令行解释器运行这个脚本,并将这个方法调用放进队列中。然后使用以下处理脚本运行这个方法。
清单 8. generic_process.php
这个脚本非常简单。它获得 get_all() 返回的处理条目,然后使用 call_user_func_array(一个 PHP 内部函数)用给定的参数动态地调用这个方法。在这个示例中,调用本地的 printvalue 函数。
To demonstrate this functionality, let’s look at what happens on the command line:
% php generic_test_add.php % php generic_process.php Printing: foo%
The output isn’t much, but you can see the gist. Through this mechanism, the processing of any PHP function can be deferred.
Now, if you don't like putting PHP function names and parameters into the database, then another approach is to establish a mapping in the PHP code between the "Processing Job Type" name in the database and the actual PHP processing function. This way, if you later decide to modify the PHP backend, the system will still work as long as the "processing job type" string matches.
Back to top
Ditch the database
Finally, I demonstrate a slightly different solution that uses a file in a directory to store the batch jobs instead of using a database. The idea provided here is not to suggest that you "adopt this method instead of using a database", it is just an alternative method, and it is up to you to decide whether to adopt it.
Obviously, there is no schema in this solution since we are not using a database. So first write a class that contains add(), get_all() and delete() methods similar to the previous example.
清单 9. batch_by_file.php
$v ) { fprintf( $fh, $k.":".$v."n" ); } fclose( $fh ); return true; } public static function get_all() { $rows = array(); if (is_dir(BATCH_DIRECTORY)) { if ($dh = opendir(BATCH_DIRECTORY)) { while (($file = readdir($dh)) !== false) { $path = BATCH_DIRECTORY.$file; if ( is_dir( $path ) == false ) { $item = array(); $item['id'] = $path; $fh = fopen( $path, 'r' ); if ( $fh ) { $item['function'] = trim(fgets( $fh )); $item['args'] = array(); while( ( $line = fgets( $fh ) ) != null ) { $args = split( ':', trim($line) ); $item['args'][$args[0]] = $args[1]; } $rows []= $item; fclose( $fh ); } } }Closedir ($ dh);}} Return $ ROWS;}? & Gt;
BatchFiles has three main methods: add (), get_all (), and delete (). This class does not access the database, but reads and writes files in the batch_items directory.
Use the following test code to add a new batch entry.
List 10. batch_by_file_test_add.php
'foo' ) );?>
One thing to note: There's really no indication of how the jobs are stored other than the class name (BatchFiles). Therefore, it is easy to change it to database-style storage in the future without modifying the interface.
Finally is the handler code.
List 11. batch_by_file_processor.php
This paragraph The code is almost identical to the database version, except that the file name and class name have been modified.
Back to top
Conclusion
As mentioned earlier, the server provides a lot of support for threads and can perform background batch processing. In some cases, it's definitely easier to use a worker thread to handle small jobs. However, batch jobs can also be created in PHP applications using traditional tools (cron, MySQL, standard object-oriented PHP and Pear::DB), which are easy to implement, deploy and maintain.
References
Learning
You can refer to the original English text of this article on the developerWorks global site.
Read IBM developerWorks’ PHP Project Resource Center to learn more about PHP.
PHP.net is an excellent resource for PHP developers.
PEAR Mail_Queue package is a robust mail queue implementation that includes a database backend.
The crontab manual provides details of cron configuration, but it is not easy to understand.
The section on Using PHP from the command line in the PHP manual can help you understand how to run scripts from cron.
Stay tuned to developerWorks technical events and webcasts.
Learn about upcoming conferences, exhibitions, webcasts, and other events around the world where IBM open source developers can learn about the latest technology developments.
Visit the developerWorks Open Source Technology Zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM products.
developerWorks podcasts include many interesting interviews and discussions suitable for software developers.
Get products and technologies
Check out PEAR -- PHP Extension and Application Repository, which includes Pear::DB.
Improve your next open source development project with IBM trial software, available as a download or on DVD.
Discussion
developerWorks PHP Developer Forum provides a place for all PHP developers to discuss technical issues. If you have questions about PHP scripts, functions, syntax, variables, debugging and other topics, you can ask them here.
Join the developerWorks community by participating in the developerWorks blog.
About the author
Jack D. Herrington is a senior software engineer with more than 20 years of work experience. He is the author of three books: Code Generation in Action, Podcasting Hacks, and PHP Hacks, and more than 30 articles.
The above introduces the implementation of batch processing in Batch Home PHP, including the content of Batch Home. I hope it will be helpful to friends who are interested in PHP tutorials.