優化大規模 API 資料檢索:最佳實踐和 PHP 延遲收集解決方案

WBOY
發布: 2024-09-12 16:18:14
原創
326 人瀏覽過

Optimizing Large-Scale API Data Retrieval: Best Practices and PHP Lazy Collection Solution

When working with APIs to retrieve vast amounts of data—potentially thousands of items—there are several crucial aspects to consider, ensuring the process is efficient, flexible, and performant. Here’s a breakdown of the key factors to manage, along with a solution for PHP users.

Key considerations when retrieving large data via API

Let me share some key considerations for efficiently retrieving large datasets via API:

  • Handling pagination: APIs typically deliver data in pages. To retrieve all the data, you need to manage pagination, performing multiple API calls while keeping track of the cursor or page number. Calculating the number of required API calls and managing this process is essential to ensure you get the complete dataset.
  • Memory management: when fetching large datasets, loading everything into memory at once can overwhelm your system. It's crucial to avoid loading all results into memory at the same time. Instead, process data in chunks, ensuring your application remains responsive and doesn’t run into memory issues.
  • Rate limiting & throttling: many APIs impose rate limits, such as restricting you to X requests per second or Y requests per minute. To stay within these limits, you must implement a flexible throttling mechanism that adapts to the API's specific restrictions.
  • Parallel API requests: given the need to perform numerous API calls due to pagination, you want to retrieve data as quickly as possible. One strategy is to make multiple API calls in parallel, all while respecting the rate limits. This ensures that your requests are both fast and compliant with API constraints.
  • Efficient data collection: despite making numerous paginated API requests, you need to combine the results into a single collection, handling them efficiently to avoid memory overload. This ensures smooth processing of data while keeping resource usage low.
  • Optimized JSON parsing: many APIs return data in JSON format. When dealing with large responses, it's important to access and query specific sections of the JSON in a performant manner, ensuring that unnecessary data isn't loaded or processed.
  • Efficient exception handling: APIs typically raise exceptions through HTTP status codes, indicating issues like timeouts, unauthorized access, or server errors. It’s important to handle these using the exception mechanism provided by your programming language. Beyond basic error handling, you should also map and raise exceptions in a way that aligns with your application's logic, making the error handling process clear and manageable. Implementing retries, logging, and mapping errors to meaningful exceptions ensures a smooth and reliable data retrieval process.

The "Lazy JSON Pages" PHP Solution

If you're working with PHP, you're in luck. The Lazy JSON Pages open source package offers a convenient, framework-agnostic API scraper that can load items from paginated JSON APIs into a Laravel lazy collection via asynchronous HTTP requests. This package simplifies pagination, throttling, parallel requests, and memory management, ensuring efficiency and performance.

You can find more information about the package, and more options to customize it in the readme of the official GitHub repository: Lazy JSON Pages.

I want to say thank you to Andrea Marco Sartori the author of the package.

Example: Retrieving Thousands of Stories from Storyblok

Here’s a concise example of retrieving thousands of stories from Storyblok using the Lazy JSON Pages package in PHP.
First, you can create a new directory, jump into the directory and start installing the package:

mkdir lazy-http
cd lazy-http
composer require cerbero/lazy-json-pages
登入後複製

Once the package is installed, you can start creating your script:

<?php

require "./vendor/autoload.php";

use Illuminate\Support\LazyCollection;  
$token = "your-storyblok-access-token";
$version = "draft"; // draft or published

$source = "https://api.storyblok.com/v2/cdn/stories?token=" . $token . "&version=" . $version;
$lazyCollection = LazyCollection::fromJsonPages($source)
    ->totalItems('total')
    ->async(requests: 3)
    ->throttle(requests: 10, perSeconds: 1)
    ->collect('stories.*');

foreach ($lazyCollection as $item) {
    echo $item["name"] . PHP_EOL;
}
登入後複製

Then you can replace your access token, and execute the script via the php command.

它是如何運作的

  • 高效分頁:API 結果分頁,惰性集合處理取得所有頁面,而不需要將所有內容儲存在記憶體中。
  • 非同步 API 呼叫:->async(requests: 3) 行並行觸發三個 API 請求,提高效能。
  • 限制: ->throttle(requests: 10, perSeconds: 1) 行確保每秒發出的請求不超過 10 個,遵守速率限制。
  • 記憶體效率:使用惰性集合可以逐項處理數據,減少記憶體開銷,即使對於大型數據集也是如此。

這種方法提供了可靠、高效能且記憶體高效的解決方案,用於從 PHP 中的 API 檢索大量資料。

參考

  • Lazy JSON Pages 套件:https://github.com/cerbero90/lazy-json-pages
  • 開源套件作者:https://github.com/cerbero90

以上是優化大規模 API 資料檢索:最佳實踐和 PHP 延遲收集解決方案的詳細內容。更多資訊請關注PHP中文網其他相關文章!

來源:dev.to
本網站聲明
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn
熱門教學
更多>
最新下載
更多>
網站特效
網站源碼
網站素材
前端模板
關於我們 免責聲明 Sitemap
PHP中文網:公益線上PHP培訓,幫助PHP學習者快速成長!