Memory Performance Boosts with Generators and Nikic/Iter
PHP iterator and generator: a powerful tool for efficient processing of large data sets
Arrays and iterations are the cornerstone of any application. As we get new tools, the way we use arrays should also improve.
For example, a generator is a new tool. At first we only have arrays, and then we gain the ability to define our own class array structure (called iterators). But since PHP 5.5, we can quickly create class iterator structures called generators.
The generator looks like functions, but we can use them as iterators. They provide us with a simple syntax for creating essentially interruptible, repeatable functions. They are amazing!
We will look at several areas where generators can be used and explore some issues that need to be paid attention to when using generators. Finally, we will learn a great library created by the talented Nikita Popov.
The sample code can be found at https://github.com/sitepoint-editors/generators-and-iter.
Key Points
- Generators (available since PHP 5.5) are powerful tools for creating iterators that allow the creation of interruptible, repeatable functions, simplifying processing of large datasets and improving memory performance.
- Nikita Popov creates Nikic/Iter library that introduces functions that can be used with iterators and generators, saving significantly memory by avoiding creating unnecessary intermediate arrays.
- The generator and Nikic/Iter libraries are especially useful when working with large CSV files, which can handle large data sets without loading them all into memory at once.
- While generators can significantly improve memory performance, they also present some of their own challenges, such as incompatible with
array_filter
andarray_map
, requiring other tools such as Nikic/Iter to handle such data.
Question
Suppose you have a lot of relational data and want to do some preloading. Maybe the data is comma-separated, you need to load each data type and group them together.
You can start with the following simple code:
function readCSV($file) { $rows = []; $handle = fopen($file, "r"); while (!feof($handle)) { $rows[] = fgetcsv($handle); } fclose($handle); return $rows; } $authors = array_filter( readCSV("authors.csv") ); $categories = array_filter( readCSV("categories.csv") ); $posts = array_filter( readCSV("posts.csv") );
You may then try to concatenate related elements by iterating or higher order functions:
function filterByColumn($array, $column, $value) { return array_filter( $array, function($item) use ($column, $value) { return $item[$column] == $value; } ); } $authors = array_map(function($author) use ($posts) { $author["posts"] = filterByColumn( $posts, 1, $author[0] ); // 对 $author 进行其他更改 return $author; }, $authors); $categories = array_map(function($category) use ($posts) { $category["posts"] = filterByColumn( $posts, 2, $category[0] ); // 对 $category 进行其他更改 return $category; }, $categories); $posts = array_map(function($post) use ($authors, $categories) { foreach ($authors as $author) { if ($author[0] == $post[1]) { $post["author"] = $author; break; } } foreach ($categories as $category) { if ($category[0] == $post[1]) { $post["category"] = $category; break; } } // 对 $post 进行其他更改 return $post; }, $posts);
Looks good, right? So, what happens when we have a large number of CSV files to parse? Let's analyze the memory usage a little...
function formatBytes($bytes, $precision = 2) { $kilobyte = 1024; $megabyte = 1024 * 1024; if ($bytes >= 0 && $bytes < $kilobyte) { return $bytes . " b"; } if ($bytes >= $kilobyte && $bytes < $megabyte) { return round($bytes / $kilobyte, $precision) . " kb"; } return round($bytes / $megabyte, $precision) . " mb"; } print "memory:" . formatBytes(memory_get_peak_usage());
(The sample code contains generate.php
, which you can use to create these CSV files...)
If you have large CSV files, this code should show how much memory it takes to link these arrays together. At least the same size as the file you have to read, because PHP has to keep everything in memory.
Generator comes to rescue!
One way to improve this problem is to use a generator. If you are not familiar with them, now is a good time to learn more.
The generator allows you to load a small amount of total data at once. You don't have to do much with the generator:
function readCSV($file) { $rows = []; $handle = fopen($file, "r"); while (!feof($handle)) { $rows[] = fgetcsv($handle); } fclose($handle); return $rows; } $authors = array_filter( readCSV("authors.csv") ); $categories = array_filter( readCSV("categories.csv") ); $posts = array_filter( readCSV("posts.csv") );
If you iterate through CSV data, you will notice that the amount of memory required will be reduced immediately:
function filterByColumn($array, $column, $value) { return array_filter( $array, function($item) use ($column, $value) { return $item[$column] == $value; } ); } $authors = array_map(function($author) use ($posts) { $author["posts"] = filterByColumn( $posts, 1, $author[0] ); // 对 $author 进行其他更改 return $author; }, $authors); $categories = array_map(function($category) use ($posts) { $category["posts"] = filterByColumn( $posts, 2, $category[0] ); // 对 $category 进行其他更改 return $category; }, $categories); $posts = array_map(function($post) use ($authors, $categories) { foreach ($authors as $author) { if ($author[0] == $post[1]) { $post["author"] = $author; break; } } foreach ($categories as $category) { if ($category[0] == $post[1]) { $post["category"] = $category; break; } } // 对 $post 进行其他更改 return $post; }, $posts);
If you've seen megabytes of memory usage before, you'll now see kilobytes. This is a huge improvement, but it is not without its problems.
First of all, array_filter
and array_map
do not work with generators. You must find other tools to process this type of data. Here is a tool you can try!
function formatBytes($bytes, $precision = 2) { $kilobyte = 1024; $megabyte = 1024 * 1024; if ($bytes >= 0 && $bytes < $kilobyte) { return $bytes . " b"; } if ($bytes >= $kilobyte && $bytes < $megabyte) { return round($bytes / $kilobyte, $precision) . " kb"; } return round($bytes / $megabyte, $precision) . " mb"; } print "memory:" . formatBytes(memory_get_peak_usage());
This library introduces some functions that can be used with iterators and generators. So how do you still get all this relevant data without saving any data in memory?
function readCSVGenerator($file) { $handle = fopen($file, "r"); while (!feof($handle)) { yield fgetcsv($handle); } fclose($handle); }
This can be simpler:
foreach (readCSVGenerator("posts.csv") as $post) { // 使用 $post 执行某些操作 } print "memory:" . formatBytes(memory_get_peak_usage());
(Rereading each data source is inefficient every time. Consider saving smaller related data (such as authors and categories) in memory...)
Other interesting things
For Nikic's library, this is just the tip of the iceberg! Ever wanted to flatten an array (or iterator/generator)?
composer require nikic/iter
You can use functions such as slice
and take
to return slices of iterable variables:
// ... (后续代码与原文类似,但使用iter库函数进行优化,此处省略以节省篇幅) ...
When you use generators more, you may find that you don't always have to reuse them. Consider the following example:
// ... (使用iter库函数简化代码,此处省略以节省篇幅) ...
If you try to run the code, you will see an exception prompting: "Cannot traverse closed generator". Each iterator function in this library has a swappable corresponding function:
// ... (使用iter\flatten和iter\toArray函数的示例代码,此处省略以节省篇幅) ...
You can use this mapping function multiple times. You can even make your own generator rewindable:
// ... (使用iter\slice和iter\toArray函数的示例代码,此处省略以节省篇幅) ...
What you get from it is a reusable generator!
Conclusion
For every loop operation you need to consider, the generator may be an option. They are even useful for other things. Where language features are insufficient, Nikic's library provides a large number of higher-order functions.
Are you already using the generator? Do you want to see more examples on how to implement them in your own application for some performance improvements? Please tell us!
(The FAQs part is similar to the original text, and is omitted here to save space. The FAQs part can be optionally retained or reorganized as needed.)
The above is the detailed content of Memory Performance Boosts with Generators and Nikic/Iter. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics





Alipay PHP...

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

Session hijacking can be achieved through the following steps: 1. Obtain the session ID, 2. Use the session ID, 3. Keep the session active. The methods to prevent session hijacking in PHP include: 1. Use the session_regenerate_id() function to regenerate the session ID, 2. Store session data through the database, 3. Ensure that all session data is transmitted through HTTPS.

The application of SOLID principle in PHP development includes: 1. Single responsibility principle (SRP): Each class is responsible for only one function. 2. Open and close principle (OCP): Changes are achieved through extension rather than modification. 3. Lisch's Substitution Principle (LSP): Subclasses can replace base classes without affecting program accuracy. 4. Interface isolation principle (ISP): Use fine-grained interfaces to avoid dependencies and unused methods. 5. Dependency inversion principle (DIP): High and low-level modules rely on abstraction and are implemented through dependency injection.

How to automatically set the permissions of unixsocket after the system restarts. Every time the system restarts, we need to execute the following command to modify the permissions of unixsocket: sudo...

How to debug CLI mode in PHPStorm? When developing with PHPStorm, sometimes we need to debug PHP in command line interface (CLI) mode...

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

Sending JSON data using PHP's cURL library In PHP development, it is often necessary to interact with external APIs. One of the common ways is to use cURL library to send POST�...
