When working with APIs that handle large datasets, it's crucial to manage data flow efficiently and address challenges such as pagination, rate limits, and memory usage. In this article, we’ll walk through how to consume APIs using JavaScript's native fetch function. We'll see important topics like:
We will explore these techniques using the Storyblok Content Delivery API and explain how to handle all these factors in JavaScript using fetch. Let’s dive into the code.
Before diving into the code, here are a few key features of the Storyblok API to consider:
Here’s how I implemented these concepts using the native fetch function in JavaScript.
Consider that:
import { writeFile, appendFile } from "fs/promises"; // Read access token from Environment const STORYBLOK_ACCESS_TOKEN = process.env.STORYBLOK_ACCESS_TOKEN; // Read access token from Environment const STORYBLOK_VERSION = process.env.STORYBLOK_VERSION; /** * Fetch a single page of data from the API, * with retry logic for rate limits (HTTP 429). */ async function fetchPage(url, page, perPage, cv) { let retryCount = 0; // Max retry attempts const maxRetries = 5; while (retryCount <= maxRetries) { try { const response = await fetch( `${url}&page=${page}&per_page=${perPage}&cv=${cv}`, ); // Handle 429 Too Many Requests (Rate Limit) if (response.status === 429) { // Some APIs provides you the Retry-After in the header // Retry After indicates how long to wait before retrying. // Storyblok uses a fixed window counter (1 second window) const retryAfter = response.headers.get("Retry-After") || 1; console.log(response.headers, `Rate limited on page ${page}. Retrying after ${retryAfter} seconds...`, ); retryCount++; // In the case of rate limit, waiting 1 second is enough. // If not we will wait 2 second at the second tentative, // in order to progressively slow down the retry requests // setTimeout accept millisecond , so we have to use 1000 as multiplier await new Promise((resolve) => setTimeout(resolve, retryAfter * 1000 * retryCount)); continue; } if (!response.ok) { throw new Error( `Failed to fetch page ${page}: HTTP ${response.status}`, ); } const data = await response.json(); // Return the stories data of the current page return data.stories || []; } catch (error) { console.error(`Error fetching page ${page}: ${error.message}`); return []; // Return an empty array if the request fails to not break the flow } } console.error(`Failed to fetch page ${page} after ${maxRetries} attempts`); return []; // If we hit the max retry limit, return an empty array } /** * Fetch all data in parallel, processing pages in batches * as a generators (the reason why we use the `*`) */ async function* fetchAllDataInParallel( url, perPage = 25, numOfParallelRequests = 5, ) { let currentPage = 1; let totalPages = null; // Fetch the first page to get: // - the total entries (the `total` HTTP header) // - the CV for caching (the `cv` atribute in the JSON response payload) const firstResponse = await fetch( `${url}&page=${currentPage}&per_page=${perPage}`, ); if (!firstResponse.ok) { console.log(`${url}&page=${currentPage}&per_page=${perPage}`); console.log(firstResponse); throw new Error(`Failed to fetch data: HTTP ${firstResponse.status}`); } console.timeLog("API", "After first response"); const firstData = await firstResponse.json(); const total = parseInt(firstResponse.headers.get("total"), 10) || 0; totalPages = Math.ceil(total / perPage); // Yield the stories from the first page for (const story of firstData.stories) { yield story; } const cv = firstData.cv; console.log(`Total pages: ${totalPages}`); console.log(`CV parameter for caching: ${cv}`); currentPage++; // Start from the second page now while (currentPage <= totalPages) { // Get the list of pages to fetch in the current batch const pagesToFetch = []; for ( let i = 0; i < numOfParallelRequests && currentPage <= totalPages; i++ ) { pagesToFetch.push(currentPage); currentPage++; } // Fetch the pages in parallel const batchRequests = pagesToFetch.map((page) => fetchPage(url, page, perPage, firstData, cv), ); // Wait for all requests in the batch to complete const batchResults = await Promise.all(batchRequests); console.timeLog("API", `Got ${batchResults.length} response`); // Yield the stories from each batch of requests for (let result of batchResults) { for (const story of result) { yield story; } } console.log(`Fetched pages: ${pagesToFetch.join(", ")}`); } } console.time("API"); const apiUrl = `https://api.storyblok.com/v2/cdn/stories?token=${STORYBLOK_ACCESS_TOKEN}&version=${STORYBLOK_VERSION}`; //const apiUrl = `http://localhost:3000?token=${STORYBLOK_ACCESS_TOKEN}&version=${STORYBLOK_VERSION}`; const stories = fetchAllDataInParallel(apiUrl, 25,7); // Create an empty file (or overwrite if it exists) before appending await writeFile('stories.json', '[', 'utf8'); // Start the JSON array let i = 0; for await (const story of stories) { i++; console.log(story.name); // If it's not the first story, add a comma to separate JSON objects if (i > 1) { await appendFile('stories.json', ',', 'utf8'); } // Append the current story to the file await appendFile('stories.json', JSON.stringify(story, null, 2), 'utf8'); } // Close the JSON array in the file await appendFile('stories.json', ']', 'utf8'); // End the JSON array console.log(`Total Stories: ${i}`);
Here’s a breakdown of the crucial steps in the code that ensure efficient and reliable API consumption using the Storyblok Content Delivery API:
1) Fetching pages with retries mechanism (fetchPage)
This function handles fetching a single page of data from the API. It includes logic for retrying when the API responds with a 429 (Too Many Requests) status, which signals that the rate limit has been exceeded.
The retryAfter value specifies how long to wait before retrying. I use setTimeout to pause before making the subsequent request, and retries are limited to a maximum of 5 attempts.
2) Initial page request and the CV parameter
The first API request is crucial because it retrieves the total header (which indicates the total number of stories) and the cv parameter (used for caching).
You can use the total header to calculate the total number of pages required, and the cv parameter ensures the cached content is used.
3) Handling pagination
Pagination is managed using the page and per_page query string parameters. The code requests 25 stories per page (you can adjust this), and the total header helps calculate how many pages need to be fetched.
The code fetches stories in batches of up to 7 (you can adjust this) parallel requests at a time to improve performance without overwhelming the API.
4) Concurrent requests with Promise.all():
To speed up the process, multiple pages are fetched in parallel using JavaScript's Promise.all(). This method sends several requests simultaneously and waits for all of them to complete.
After each batch of parallel requests is completed, the results are processed to yield the stories. This avoids loading all the data into memory at once, reducing memory consumption.
5) Memory management with asynchronous iteration (for await...of):
Instead of collecting all data into an array, we use JavaScript Generators (function* and for await...of) to process each story as it is fetched. This prevents memory overload when handling large datasets.
By yielding the stories one by one, the code remains efficient and avoids memory leaks.
6) Rate limit handling:
If the API responds with a 429 status code (rate-limited), the script uses the retryAfter value. It then pauses for the specified time before retrying the request. This ensures compliance with API rate limits and avoids sending too many requests too quickly.
In this article, We covered the key considerations when consuming APIs in JavaScript using the native fetch function. I try to handle:
By applying these techniques, you can handle API consumption in a scalable, efficient, and memory-safe way.
Feel free to drop your comments/feedback.
The above is the detailed content of Efficient API consumption for huge data in JavaScript. For more information, please follow other related articles on the PHP Chinese website!