


How can Node crawl headline videos in batches and save them (code implementation)
The content of this article is about how Node implements batch crawling and saving of headline videos (code implementation). It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
Introduction
The general routine for crawling videos or pictures in batches is to use a crawler to obtain a collection of file links, and then save the files one by one through methods such as writeFile. However, the video link of Toutiao cannot be captured in the html file (server-side rendering output) that needs to be crawled. The video link is dynamically calculated and added to the video tag based on the known key or hash value of the video through the algorithm or decryption method in certain js files when the page is rendered on the client side. This is also an anti-crawling measure for the website.
When we browse these pages, we can see the calculated file address through the audit element. However, when downloading in batches, it is obviously not advisable to manually obtain video links one by one. Fortunately, puppeteer provides the function of simulating access to Chrome, allowing us to crawl the final page rendered by the browser.
Project Start
Commandnpm i npm start
Notice: The process of installing puppeteer is a little slow, please wait patiently.
Configuration file// 配置相关 module.exports = { originPath: 'https://www.ixigua.com', // 页面请求地址 savePath: 'D:/videoZZ' // 存放路径 }
Technical points
puppeteerOfficial API
puppeteer provides a high-level API to control Chrome or Chromium.
puppeteer Main function:
Use web pages to generate PDFs and images
Crawl SPA applications and generate pre-rendered content (i.e. "SSR" server-side rendering)
Can capture content from the website
Automated form submission, UI testing, keyboard input, etc.
API used:
puppeteer.launch() Launch browser instance
browser .newPage() Create a new page
page.goto() Enter the specified webpage
page.screenshot() Screenshot
page.waitFor() The page waits, which can be time, a certain element, or a certain function
page.$eval() Gets a specified element, Equivalent to document.querySelector
- ##page.$$eval() to obtain a certain type of element, equivalent to document.querySelectorAll
- page.$( '#id .className') Get an element in the document, the operation is similar to jQuery
const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com'); await page.screenshot({path: 'example.png'}); await browser.close(); })();
- Download video main method
const downloadVideo = async video => { // 判断视频文件是否已经下载 if (!fs.existsSync(`${config.savePath}/${video.title}.mp4`)) { await getVideoData(video.src, 'binary').then(fileData => { console.log('下载视频中:', video.title) savefileToPath(video.title, fileData).then(res => console.log(`${res}: ${video.title}`) ) }) } else { console.log(`视频文件已存在:${video.title}`) } }
- Get video data
getVideoData (url, encoding) { return new Promise((resolve, reject) => { let req = http.get(url, function (res) { let result = '' encoding && res.setEncoding(encoding) res.on('data', function (d) { result += d }) res.on('end', function () { resolve(result) }) res.on('error', function (e) { reject(e) }) }) req.end() }) }
- Save video data to local
savefileToPath (fileName, fileData) { let fileFullName = `${config.savePath}/${fileName}.mp4` return new Promise((resolve, reject) => { fs.writeFile(fileFullName, fileData, 'binary', function (err) { if (err) { console.log('savefileToPath error:', err) } resolve('已下载') }) }) }
Project address:
Github address
The above is the detailed content of How can Node crawl headline videos in batches and save them (code implementation). For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



When you log in to someone else's steam account on your computer, and that other person's account happens to have wallpaper software, steam will automatically download the wallpapers subscribed to the other person's account after switching back to your own account. Users can solve this problem by turning off steam cloud synchronization. What to do if wallpaperengine downloads other people's wallpapers after logging into another account 1. Log in to your own steam account, find cloud synchronization in settings, and turn off steam cloud synchronization. 2. Log in to someone else's Steam account you logged in before, open the Wallpaper Creative Workshop, find the subscription content, and then cancel all subscriptions. (In case you cannot find the wallpaper in the future, you can collect it first and then cancel the subscription) 3. Switch back to your own steam

Recently, many users have been asking the editor, how to download links starting with 115://? If you want to download links starting with 115://, you need to use the 115 browser. After you download the 115 browser, let's take a look at the download tutorial compiled by the editor below. Introduction to how to download links starting with 115:// 1. Log in to 115.com, download and install the 115 browser. 2. Enter: chrome://extensions/ in the 115 browser address bar, enter the extension center, search for Tampermonkey, and install the corresponding plug-in. 3. Enter in the address bar of 115 browser: Grease Monkey Script: https://greasyfork.org/en/

With the rise of short video platforms, Douyin has become an indispensable part of everyone's daily life. On TikTok, we can see interesting videos from all over the world. Some people like to post other people’s videos, which raises a question: Is Douyin infringing upon posting other people’s videos? This article will discuss this issue and tell you how to edit videos without infringement and how to avoid infringement issues. 1. Is it infringing upon Douyin’s posting of other people’s videos? According to the provisions of my country's Copyright Law, unauthorized use of the copyright owner's works without the permission of the copyright owner is an infringement. Therefore, posting other people’s videos on Douyin without the permission of the original author or copyright owner is an infringement. 2. How to edit a video without infringement? 1. Use of public domain or licensed content: Public

The superpeople game can be downloaded through the steam client. The size of this game is about 28G. It usually takes one and a half hours to download and install. Here is a specific download and installation tutorial for you! New method to apply for global closed testing 1) Search for "SUPERPEOPLE" in the Steam store (steam client download) 2) Click "Request access to SUPERPEOPLE closed testing" at the bottom of the "SUPERPEOPLE" store page 3) After clicking the request access button, The "SUPERPEOPLECBT" game can be confirmed in the Steam library 4) Click the install button in "SUPERPEOPLECBT" and download

Many users need to download files when using Quark Network Disk, but we want to save them locally, so how to set this up? Let this site introduce to users in detail how to save files downloaded from Quark Network Disk back to the local computer. How to save files downloaded from Quark network disk back to your local computer 1. Open Quark, log in to your account, and click the list icon. 2. After clicking the icon, select the network disk. 3. After entering Quark Network Disk, click My Files. 4. After entering My Files, select the file you want to download and click the three-dot icon. 5. Check the file you want to download and click Download.

foobar2000 is a software that can listen to music resources at any time. It brings you all kinds of music with lossless sound quality. The enhanced version of the music player allows you to get a more comprehensive and comfortable music experience. Its design concept is to play the advanced audio on the computer The device is transplanted to mobile phones to provide a more convenient and efficient music playback experience. The interface design is simple, clear and easy to use. It adopts a minimalist design style without too many decorations and cumbersome operations to get started quickly. It also supports a variety of skins and Theme, personalize settings according to your own preferences, and create an exclusive music player that supports the playback of multiple audio formats. It also supports the audio gain function to adjust the volume according to your own hearing conditions to avoid hearing damage caused by excessive volume. Next, let me help you

Douyin, the national short video platform, not only allows us to enjoy a variety of interesting and novel short videos in our free time, but also gives us a stage to show ourselves and realize our values. So, how to make money by posting videos on Douyin? This article will answer this question in detail and help you make more money on TikTok. 1. How to make money from posting videos on Douyin? After posting a video and gaining a certain amount of views on Douyin, you will have the opportunity to participate in the advertising sharing plan. This income method is one of the most familiar to Douyin users and is also the main source of income for many creators. Douyin decides whether to provide advertising sharing opportunities based on various factors such as account weight, video content, and audience feedback. The TikTok platform allows viewers to support their favorite creators by sending gifts,

With the rise of short video platforms, Xiaohongshu has become a platform for many people to share their lives, express themselves, and gain traffic. On this platform, publishing video works is a very popular way of interaction. So, how to publish Xiaohongshu video works? 1. How to publish Xiaohongshu video works? First, make sure you have a video content ready to share. You can use your mobile phone or other camera equipment to shoot, but you need to pay attention to the image quality and sound clarity. 2. Edit the video: In order to make the work more attractive, you can edit the video. You can use professional video editing software, such as Douyin, Kuaishou, etc., to add filters, music, subtitles and other elements. 3. Choose a cover: The cover is the key to attracting users to click. Choose a clear and interesting picture as the cover to attract users to click on it.
